1
|
Xie F, Lee MS, Allahwerdy S, Getahun D, Wessler B, Chen W. Identifying the Severity of Heart Valve Stenosis and Regurgitation Among a Diverse Population Within an Integrated Health Care System: Natural Language Processing Approach. JMIR Cardio 2024; 8:e60503. [PMID: 39348175 PMCID: PMC11474122 DOI: 10.2196/60503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 09/04/2024] [Accepted: 09/09/2024] [Indexed: 10/01/2024] Open
Abstract
BACKGROUND Valvular heart disease (VHD) is a leading cause of cardiovascular morbidity and mortality that poses a substantial health care and economic burden on health care systems. Administrative diagnostic codes for ascertaining VHD diagnosis are incomplete. OBJECTIVE This study aimed to develop a natural language processing (NLP) algorithm to identify patients with aortic, mitral, tricuspid, and pulmonic valve stenosis and regurgitation from transthoracic echocardiography (TTE) reports within a large integrated health care system. METHODS We used reports from echocardiograms performed in the Kaiser Permanente Southern California (KPSC) health care system between January 1, 2011, and December 31, 2022. Related terms/phrases of aortic, mitral, tricuspid, and pulmonic stenosis and regurgitation and their severities were compiled from the literature and enriched with input from clinicians. An NLP algorithm was iteratively developed and fine-trained via multiple rounds of chart review, followed by adjudication. The developed algorithm was applied to 200 annotated echocardiography reports to assess its performance and then the study echocardiography reports. RESULTS A total of 1,225,270 TTE reports were extracted from KPSC electronic health records during the study period. In these reports, valve lesions identified included 111,300 (9.08%) aortic stenosis, 20,246 (1.65%) mitral stenosis, 397 (0.03%) tricuspid stenosis, 2585 (0.21%) pulmonic stenosis, 345,115 (28.17%) aortic regurgitation, 802,103 (65.46%) mitral regurgitation, 903,965 (73.78%) tricuspid regurgitation, and 286,903 (23.42%) pulmonic regurgitation. Among the valves, 50,507 (4.12%), 22,656 (1.85%), 1685 (0.14%), and 1767 (0.14%) were identified as prosthetic aortic valves, mitral valves, tricuspid valves, and pulmonic valves, respectively. Mild and moderate were the most common severity levels of heart valve stenosis, while trace and mild were the most common severity levels of regurgitation. Males had a higher frequency of aortic stenosis and all 4 valvular regurgitations, while females had more mitral, tricuspid, and pulmonic stenosis. Non-Hispanic Whites had the highest frequency of all 4 valvular stenosis and regurgitations. The distribution of valvular stenosis and regurgitation severity was similar across race/ethnicity groups. Frequencies of aortic stenosis, mitral stenosis, and regurgitation of all 4 heart valves increased with age. In TTE reports with stenosis detected, younger patients were more likely to have mild aortic stenosis, while older patients were more likely to have severe aortic stenosis. However, mitral stenosis was opposite (milder in older patients and more severe in younger patients). In TTE reports with regurgitation detected, younger patients had a higher frequency of severe/very severe aortic regurgitation. In comparison, older patients had higher frequencies of mild aortic regurgitation and severe mitral/tricuspid regurgitation. Validation of the NLP algorithm against the 200 annotated TTE reports showed excellent precision, recall, and F1-scores. CONCLUSIONS The proposed computerized algorithm could effectively identify heart valve stenosis and regurgitation, as well as the severity of valvular involvement, with significant implications for pharmacoepidemiological studies and outcomes research.
Collapse
Affiliation(s)
- Fagen Xie
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Ming-Sum Lee
- Department of Cardiology, Los Angeles Medical Center, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Salam Allahwerdy
- Department of Clinical Science, Kaiser Permanente Bernard J Tyson School of Medicine, Pasadena, CA, United States
| | - Darios Getahun
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Benjamin Wessler
- Division of Cardiology, Tufts Medical Center, Boston, MA, United States
| | - Wansu Chen
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| |
Collapse
|
2
|
Bose B, Butt SA, Arshad HB, Nicolas CC, Gullapelli R, Nwana N, Javed Z, Shahid I, Pournazari P, Patel K, Chamsi Pasha MA, Little SH, Faza NS, Jones S, Cainzos MA, Al-Kindi S, Saad JM, Zoghbi W, Nagueh SF, Nasir K. Building a Novel Artificial Intelligence-Driven Echocardiographic Data Pipeline: Findings From a Large Learning Health System. J Am Soc Echocardiogr 2024; 37:916-918. [PMID: 38830435 DOI: 10.1016/j.echo.2024.05.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 05/22/2024] [Accepted: 05/26/2024] [Indexed: 06/05/2024]
Affiliation(s)
- Budhaditya Bose
- Cardiovascular Computational and Precision Health, Department of Cardiology, Houston, Texas
| | - Sara A Butt
- Center for Health Data Science and Analytics, Houston, Texas
| | - Hassan B Arshad
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas
| | - Charlie C Nicolas
- Cardiovascular Computational and Precision Health, Department of Cardiology, Houston, Texas
| | - Rakesh Gullapelli
- Cardiovascular Computational and Precision Health, Department of Cardiology, Houston, Texas
| | - Nwabunie Nwana
- Cardiovascular Computational and Precision Health, Department of Cardiology, Houston, Texas
| | - Zulqarnain Javed
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas; Cardiovascular Computational and Precision Health, Department of Cardiology, Houston, Texas
| | - Izza Shahid
- Cardiovascular Computational and Precision Health, Department of Cardiology, Houston, Texas
| | - Payam Pournazari
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas
| | - Kershaw Patel
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas
| | - M A Chamsi Pasha
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas
| | - Stephen H Little
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas
| | - Nadeen S Faza
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas
| | - Stephen Jones
- Center for Health Data Science and Analytics, Houston, Texas
| | - M A Cainzos
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas; Cardiovascular Computational and Precision Health, Department of Cardiology, Houston, Texas
| | - Sadeer Al-Kindi
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas; Cardiovascular Computational and Precision Health, Department of Cardiology, Houston, Texas
| | - Jean Michel Saad
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas
| | - William Zoghbi
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas
| | - Sherif F Nagueh
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas
| | - Khurram Nasir
- Houston Methodist DeBakey Heart and Vascular Center, Houston, Texas; Cardiovascular Computational and Precision Health, Department of Cardiology, Houston, Texas; Center for Health Data Science and Analytics, Houston, Texas
| |
Collapse
|
3
|
Maciejewski C, Ozierański K, Barwiołek A, Basza M, Bożym A, Ciurla M, Janusz Krajsman M, Maciejewska M, Lodziński P, Opolski G, Grabowski M, Cacko A, Balsam P. AssistMED project: Transforming cardiology cohort characterisation from electronic health records through natural language processing - Algorithm design, preliminary results, and field prospects. Int J Med Inform 2024; 185:105380. [PMID: 38447318 DOI: 10.1016/j.ijmedinf.2024.105380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/08/2024]
Abstract
INTRODUCTION Electronic health records (EHR) are of great value for clinical research. However, EHR consists primarily of unstructured text which must be analysed by a human and coded into a database before data analysis- a time-consuming and costly process limiting research efficiency. Natural language processing (NLP) can facilitate data retrieval from unstructured text. During AssistMED project, we developed a practical, NLP tool that automatically provides comprehensive clinical characteristics of patients from EHR, that is tailored to clinical researchers needs. MATERIAL AND METHODS AssistMED retrieves patient characteristics regarding clinical conditions, medications with dosage, and echocardiographic parameters with clinically oriented data structure and provides researcher-friendly database output. We validate the algorithm performance against manual data retrieval and provide critical quantitative and qualitative analysis. RESULTS AssistMED analysed the presence of 56 clinical conditions, medications from 16 drug groups with dosage and 15 numeric echocardiographic parameters in a sample of 400 patients hospitalized in the cardiology unit. No statistically significant differences between algorithm and human retrieval were noted. Qualitative analysis revealed that disagreements with manual annotation were primarily accounted to random algorithm errors, erroneous human annotation and lack of advanced context awareness of our tool. CONCLUSIONS Current NLP approaches are feasible to acquire accurate and detailed patient characteristics tailored to clinical researchers' needs from EHR. We present an in-depth description of an algorithm development and validation process, discuss obstacles and pinpoint potential solutions, including opportunities arising with recent advancements in the field of NLP, such as large language models.
Collapse
Affiliation(s)
- Cezary Maciejewski
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland; Doctoral School, Medical University of Warsaw, 02-091 Warszawa, Poland; Department of Medical Informatics and Telemedicine, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Krzysztof Ozierański
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland.
| | - Adam Barwiołek
- Codifive sp. z o.o., Lindleya 16, 02-013 Warszawa, Poland
| | - Mikołaj Basza
- Medical University of Silesia in Katowice, 40-055 Katowice, Poland
| | - Aleksandra Bożym
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Michalina Ciurla
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Maciej Janusz Krajsman
- Department of Medical Informatics and Telemedicine, Medical University of Warsaw, 02-091 Warszawa, Poland
| | | | - Piotr Lodziński
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Grzegorz Opolski
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Marcin Grabowski
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Andrzej Cacko
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland; Department of Medical Informatics and Telemedicine, Medical University of Warsaw, 02-091 Warszawa, Poland
| | - Paweł Balsam
- 1st Chair and Department of Cardiology, Medical University of Warsaw, 02-091 Warszawa, Poland
| |
Collapse
|
4
|
Dong T, Sunderland N, Nightingale A, Fudulu DP, Chan J, Zhai B, Freitas A, Caputo M, Dimagli A, Mires S, Wyatt M, Benedetto U, Angelini GD. Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database. Bioengineering (Basel) 2023; 10:1307. [PMID: 38002431 PMCID: PMC10669818 DOI: 10.3390/bioengineering10111307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/03/2023] [Accepted: 11/09/2023] [Indexed: 11/26/2023] Open
Abstract
BACKGROUND Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. OBJECTIVES To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. METHODS 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. RESULTS Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p < 0.05) alongside high R2 values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75-0.9, p < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E' Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. CONCLUSIONS The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.
Collapse
Affiliation(s)
- Tim Dong
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Nicholas Sunderland
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Angus Nightingale
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Daniel P. Fudulu
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Jeremy Chan
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Ben Zhai
- School of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UK
| | - Alberto Freitas
- Faculty of Medicine, University of Porto, 4100 Porto, Portugal;
| | - Massimo Caputo
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Arnaldo Dimagli
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Stuart Mires
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Mike Wyatt
- University Hospitals Bristol and Weston, Marlborough St, Bristol BS1 3NU, UK;
| | - Umberto Benedetto
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| | - Gianni D. Angelini
- Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK (A.N.); (J.C.); (M.C.); (A.D.); (U.B.); (G.D.A.)
| |
Collapse
|
5
|
Berman AN, Ginder C, Sporn ZA, Tanguturi V, Hidrue MK, Shirkey LB, Zhao Y, Blankstein R, Turchin A, Wasfy JH. Natural Language Processing for the Ascertainment and Phenotyping of Left Ventricular Hypertrophy and Hypertrophic Cardiomyopathy on Echocardiogram Reports. Am J Cardiol 2023; 206:247-253. [PMID: 37714095 DOI: 10.1016/j.amjcard.2023.08.109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/17/2023] [Accepted: 08/20/2023] [Indexed: 09/17/2023]
Abstract
Extracting and accurately phenotyping electronic health documentation is critical for medical research and clinical care. We sought to develop a highly accurate and open-source natural language processing (NLP) module to ascertain and phenotype left ventricular hypertrophy (LVH) and hypertrophic cardiomyopathy (HCM) diagnoses from echocardiogram reports within a diverse hospital network. After the initial development on 17,250 echocardiogram reports, 700 unique reports from 6 hospitals were randomly selected from data repositories within the Mass General Brigham healthcare system and manually adjudicated by physicians for 10 subtypes of LVH and diagnoses of HCM. Using an open-source NLP system, the module was formally tested on 300 training set reports and validated on 400 reports. The sensitivity, specificity, positive predictive value, and negative predictive value were calculated to assess the discriminative accuracy of the NLP module. The NLP demonstrated robust performance across the 10 LVH subtypes, with the overall sensitivity and specificity exceeding 96%. In addition, the NLP module demonstrated excellent performance in detecting HCM diagnoses, with sensitivity and specificity exceeding 93%. In conclusion, we designed a highly accurate NLP module to determine the presence of LVH and HCM on echocardiogram reports. Our work demonstrates the feasibility and accuracy of NLP to detect diagnoses on imaging reports, even when described in free text. This module has been placed in the public domain to advance research, trial recruitment, and population health management for patients with LVH-associated conditions.
Collapse
Affiliation(s)
- Adam N Berman
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital
| | - Curtis Ginder
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital
| | | | - Varsha Tanguturi
- Cardiology Division, Department of Medicine, Massachusetts General Hospital
| | - Michael K Hidrue
- Division of Performance Analysis and Improvement, Massachusetts General Physicians Organization, Massachusetts General Hospital
| | - Linnea B Shirkey
- Division of Performance Analysis and Improvement, Massachusetts General Physicians Organization, Massachusetts General Hospital
| | - Yunong Zhao
- Cardiology Division, Department of Medicine, Massachusetts General Hospital
| | - Ron Blankstein
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital
| | - Alexander Turchin
- Division of Endocrinology, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Jason H Wasfy
- Cardiology Division, Department of Medicine, Massachusetts General Hospital.
| |
Collapse
|
6
|
Bosch D, Kuppen MCP, Tascilar M, Smilde TJ, Mulders PFA, Uyl-de Groot CA, van Oort IM. Reliability and Efficiency of the CAPRI-3 Metastatic Prostate Cancer Registry Driven by Artificial Intelligence. Cancers (Basel) 2023; 15:3808. [PMID: 37568624 PMCID: PMC10417512 DOI: 10.3390/cancers15153808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/19/2023] [Accepted: 07/23/2023] [Indexed: 08/13/2023] Open
Abstract
BACKGROUND Manual data collection is still the gold standard for disease-specific patient registries. However, CAPRI-3 uses text mining (an artificial intelligence (AI) technology) for patient identification and data collection. The aim of this study is to demonstrate the reliability and efficiency of this AI-driven approach. METHODS CAPRI-3 is an observational retrospective multicenter cohort registry on metastatic prostate cancer. We tested the patient-identification algorithm and automated data extraction through manual validation of the same patients in two pilots in 2019 and 2022. RESULTS Pilot one identified 2030 patients and pilot two 9464 patients. The negative predictive value of the algorithm was maximized to prevent false exclusions and reached 94.8%. The completeness and accuracy of the automated data extraction were 92.3% or higher, except for date fields and inaccessible data (images/pdf) (10-88.9%). Additional manual quality control took over 3 h less time per patient than the original fully manual CAPRI registry (105 vs. 300 min). CONCLUSIONS The CAPRI-3 patient-identification algorithm is a sound replacement for excluding ineligible candidates. The AI-driven data extraction is largely accurate and complete, but manual quality control is needed for less reliable and inaccessible data. Overall, the AI-driven approach of the CAPRI-3 registry is reliable and timesaving.
Collapse
Affiliation(s)
- Dianne Bosch
- Department of Urology, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands (I.M.v.O.)
| | - Malou C. P. Kuppen
- Department of Radiotherapy, Maastro Clinic, 6229 ET Maastricht, The Netherlands
| | - Metin Tascilar
- Department of Medical Oncology, Isala Hospital, 8025 AB Zwolle, The Netherlands
| | - Tineke J. Smilde
- Department of Medical Oncology, Jeroen Bosch Hospital, 5223 GZ ‘s-Hertogenbosch, The Netherlands;
| | - Peter F. A. Mulders
- Department of Urology, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands (I.M.v.O.)
| | - Carin A. Uyl-de Groot
- Erasmus School of Health Policy and Management, Erasmus University Rotterdam, 3062 PA Rotterdam, The Netherlands
| | - Inge M. van Oort
- Department of Urology, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands (I.M.v.O.)
| |
Collapse
|
7
|
Vidal-Perez R, Grapsa J, Bouzas-Mosquera A, Fontes-Carvalho R, Vazquez-Rodriguez JM. Current role and future perspectives of artificial intelligence in echocardiography. World J Cardiol 2023; 15:284-292. [PMID: 37397831 PMCID: PMC10308270 DOI: 10.4330/wjc.v15.i6.284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 05/02/2023] [Accepted: 06/21/2023] [Indexed: 06/26/2023] Open
Abstract
Echocardiography is an essential tool in diagnostic cardiology and is fundamental to clinical care. Artificial intelligence (AI) can help health care providers serving as a valuable diagnostic tool for physicians in the field of echocardiography specially on the automation of measurements and interpretation of results. In addition, it can help expand the capabilities of research and discover alternative pathways in medical management specially on prognostication. In this review article, we describe the current role and future perspectives of AI in echocardiography.
Collapse
Affiliation(s)
- Rafael Vidal-Perez
- Servicio de Cardiología, Unidad de Imagen y Función Cardíaca, Complexo Hospitalario Universitario A Coruña Centro de Investigación Biomédica en Red-Instituto de Salud Carlos III, A Coruña 15006, Spain
| | - Julia Grapsa
- Department of Cardiology, Guys and St Thomas NHS Trust, London SE1 7EH, United Kingdom
| | - Alberto Bouzas-Mosquera
- Servicio de Cardiología, Unidad de Imagen y Función Cardíaca, Complexo Hospitalario Universitario A Coruña Centro de Investigación Biomédica en Red-Instituto de Salud Carlos III, A Coruña 15006, Spain
| | - Ricardo Fontes-Carvalho
- Cardiology Department, Centro Hospitalar de Vila Nova de Gaia/Espinho, Vilanova de Gaia 4434-502, Portugal
- Cardiovascular R&D Centre - UnIC@RISE, Department of Surgery and Physiology, Faculty of Medicine of the University of Porto, Porto 4200-319, Portugal
| | | |
Collapse
|
8
|
Dewaswala N, Chen D, Bhopalwala H, Kaggal VC, Murphy SP, Bos JM, Geske JB, Gersh BJ, Ommen SR, Araoz PA, Ackerman MJ, Arruda-Olson AM. Natural language processing for identification of hypertrophic cardiomyopathy patients from cardiac magnetic resonance reports. BMC Med Inform Decis Mak 2022; 22:272. [PMID: 36258218 PMCID: PMC9580188 DOI: 10.1186/s12911-022-02017-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 10/10/2022] [Indexed: 11/30/2022] Open
Abstract
Background Cardiac magnetic resonance (CMR) imaging is important for diagnosis and risk stratification of hypertrophic cardiomyopathy (HCM) patients. However, collection of information from large numbers of CMR reports by manual review is time-consuming, error-prone and costly. Natural language processing (NLP) is an artificial intelligence method for automated extraction of information from narrative text including text in CMR reports in electronic health records (EHR). Our objective was to assess whether NLP can accurately extract diagnosis of HCM from CMR reports.
Methods An NLP system with two tiers was developed for information extraction from narrative text in CMR reports; the first tier extracted information regarding HCM diagnosis while the second extracted categorical and numeric concepts for HCM classification. We randomly allocated 200 HCM patients with CMR reports from 2004 to 2018 into training (100 patients with 185 CMR reports) and testing sets (100 patients with 206 reports). Results NLP algorithms demonstrated very high performance compared to manual annotation. The algorithm to extract HCM diagnosis had accuracy of 0.99. The accuracy for categorical concepts included HCM morphologic subtype 0.99, systolic anterior motion of the mitral valve 0.96, mitral regurgitation 0.93, left ventricular (LV) obstruction 0.94, location of obstruction 0.92, apical pouch 0.98, LV delayed enhancement 0.93, left atrial enlargement 0.99 and right atrial enlargement 0.98. Accuracy for numeric concepts included maximal LV wall thickness 0.96, LV mass 0.99, LV mass index 0.98, LV ejection fraction 0.98 and right ventricular ejection fraction 0.99. Conclusions NLP identified and classified HCM from CMR narrative text reports with very high performance.
Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-02017-y.
Collapse
Affiliation(s)
- Nakeya Dewaswala
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - David Chen
- Department of Cardiovascular Surgery, Cleveland Clinic, OH, Cleveland, USA
| | - Huzefa Bhopalwala
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - Vinod C Kaggal
- Enterprise Technology Services, Shared Service Offices, Mayo Clinic, MN, Rochester, USA
| | - Sean P Murphy
- Advanced Analytics Services, Mayo Clinic Rochester, Rochester, MN, USA
| | - J Martijn Bos
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - Jeffrey B Geske
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - Bernard J Gersh
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - Steve R Ommen
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA
| | - Philip A Araoz
- Department of Radiology, Mayo Clinic Rochester, Rochester, MN, USA
| | - Michael J Ackerman
- Department of Cardiovascular Medicine, Mayo Clinic Rochester, Rochester, MN, USA.,Department of Pediatric and Adolescent Medicine, Mayo Clinic Rochester, Rochester, MN, USA.,Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic Rochester, Rochester, MN, USA
| | | |
Collapse
|
9
|
Singh P, Haimovich J, Reeder C, Khurshid S, Lau ES, Cunningham JW, Philippakis A, Anderson CD, Ho JE, Lubitz SA, Batra P. One Clinician Is All You Need-Cardiac Magnetic Resonance Imaging Measurement Extraction: Deep Learning Algorithm Development. JMIR Med Inform 2022; 10:e38178. [PMID: 35960155 PMCID: PMC9526125 DOI: 10.2196/38178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/22/2022] [Accepted: 08/11/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Cardiac magnetic resonance imaging (CMR) is a powerful diagnostic modality that provides detailed quantitative assessment of cardiac anatomy and function. Automated extraction of CMR measurements from clinical reports that are typically stored as unstructured text in electronic health record systems would facilitate their use in research. Existing machine learning approaches either rely on large quantities of expert annotation or require the development of engineered rules that are time-consuming and are specific to the setting in which they were developed. OBJECTIVE We hypothesize that the use of pretrained transformer-based language models may enable label-efficient numerical extraction from clinical text without the need for heuristics or large quantities of expert annotations. Here, we fine-tuned pretrained transformer-based language models on a small quantity of CMR annotations to extract 21 CMR measurements. We assessed the effect of clinical pretraining to reduce labeling needs and explored alternative representations of numerical inputs to improve performance. METHODS Our study sample comprised 99,252 patients that received longitudinal cardiology care in a multi-institutional health care system. There were 12,720 available CMR reports from 9280 patients. We adapted PRAnCER (Platform Enabling Rapid Annotation for Clinical Entity Recognition), an annotation tool for clinical text, to collect annotations from a study clinician on 370 reports. We experimented with 5 different representations of numerical quantities and several model weight initializations. We evaluated extraction performance using macroaveraged F1-scores across the measurements of interest. We applied the best-performing model to extract measurements from the remaining CMR reports in the study sample and evaluated established associations between selected extracted measures with clinical outcomes to demonstrate validity. RESULTS All combinations of weight initializations and numerical representations obtained excellent performance on the gold-standard test set, suggesting that transformer models fine-tuned on a small set of annotations can effectively extract numerical quantities. Our results further indicate that custom numerical representations did not appear to have a significant impact on extraction performance. The best-performing model achieved a macroaveraged F1-score of 0.957 across the evaluated CMR measurements (range 0.92 for the lowest-performing measure of left atrial anterior-posterior dimension to 1.0 for the highest-performing measures of left ventricular end systolic volume index and left ventricular end systolic diameter). Application of the best-performing model to the study cohort yielded 136,407 measurements from all available reports in the study sample. We observed expected associations between extracted left ventricular mass index, left ventricular ejection fraction, and right ventricular ejection fraction with clinical outcomes like atrial fibrillation, heart failure, and mortality. CONCLUSIONS This study demonstrated that a domain-agnostic pretrained transformer model is able to effectively extract quantitative clinical measurements from diagnostic reports with a relatively small number of gold-standard annotations. The proposed workflow may serve as a roadmap for other quantitative entity extraction.
Collapse
Affiliation(s)
- Pulkit Singh
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Julian Haimovich
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Christopher Reeder
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Shaan Khurshid
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, United States
| | - Emily S Lau
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Jonathan W Cunningham
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Division of Cardiology, Brigham and Women's Hospital, Boston, MA, United States
| | - Anthony Philippakis
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Eric and Wendy Schmidt Center, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Christopher D Anderson
- Department of Neurology, Brigham and Women's Hospital, Boston, MA, United States
- Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, MA, United States
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Jennifer E Ho
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- CardioVascular Institute and Division of Cardiology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, United States
| | - Steven A Lubitz
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, United States
| | - Puneet Batra
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| |
Collapse
|
10
|
Jiang R, Yeung DF, Behnami D, Luong C, Tsang MYC, Jue J, Gin K, Nair P, Abolmaesumi P, Tsang TSM. A Novel Continuous Left Ventricular Diastolic Function Score Using Machine Learning. J Am Soc Echocardiogr 2022; 35:1247-1255. [PMID: 35753590 DOI: 10.1016/j.echo.2022.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 06/02/2022] [Accepted: 06/05/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND Unlike left ventricular (LV) ejection fraction, which provides a precise, reliable, and prognostically valuable measure of systolic function, there is no single analogous measure of LV diastolic function. OBJECTIVES We aimed to develop a continuous score to grade LV diastolic function using machine learning modeling of echocardiographic data. METHODS Consecutive echo studies performed at a tertiary care centre between February 1, 2010 and March 31, 2016 were assessed, excluding studies containing features that would interfere with diastolic function assessment as well as studies in which one or more parameters within the contemporary diastolic function assessment algorithm were not reported. Diastolic function was graded based on 2016 American Society of Echocardiography (ASE) / European Association of Cardiovascular Imaging (EACVI) guidelines, excluding indeterminate studies. Machine learning models were trained (SVM [support vector machine], DT [decision tree], XGB [XGBoost], and DNN [dense neural network]) to classify studies within the training set by diastolic dysfunction severity, blinded to the ASE/EACVI classification. The DNN model was retrained to generate a regression model (R-DNN) to predict a continuous LV diastolic function score. RESULTS A total of 28,986 studies were included; 23,188 studies were used to train the models and 5798 studies were used for validation. The models were able to reclassify studies with high agreement to the ASE/EACVI algorithm (SVM 83%, DT 100%, XGB 100%, DNN 98%). The continuous diastolic function score corresponded well with ASE/EACVI guidelines, with scores of 1.00 ± 0.01 for studies with normal function; and 0.74 ± 0.05, 0.51 ± 0.06, and 0.27 ± 0.11 for mild, moderate, and severe diastolic dysfunction respectively (mean ± 1 standard deviation). A score of <0.91 predicted abnormal diastolic function (AUC 0.99) while a score of <0.65 predicted elevated filling pressure (AUC 0.99). CONCLUSIONS Machine learning can assimilate echocardiographic data and generate an automated continuous diastolic function score that corresponds well with current diastolic function grading recommendations.
Collapse
Affiliation(s)
- River Jiang
- Division of Cardiology, University of British Columbia, Gordon & Leslie Diamond Health Care Centre, 2775 Laurel St, 9th Floor, Vancouver, BC, Canada V5Z 1M9
| | - Darwin F Yeung
- Division of Cardiology, University of British Columbia, Gordon & Leslie Diamond Health Care Centre, 2775 Laurel St, 9th Floor, Vancouver, BC, Canada V5Z 1M9
| | - Delaram Behnami
- Department of Electrical and Computer Engineering, University of British Columbia, 5500-2332 Main Mall, Vancouver, BC, Canada V6T 1Z4
| | - Christina Luong
- Division of Cardiology, University of British Columbia, Gordon & Leslie Diamond Health Care Centre, 2775 Laurel St, 9th Floor, Vancouver, BC, Canada V5Z 1M9
| | - Michael Y C Tsang
- Division of Cardiology, University of British Columbia, Gordon & Leslie Diamond Health Care Centre, 2775 Laurel St, 9th Floor, Vancouver, BC, Canada V5Z 1M9
| | - John Jue
- Division of Cardiology, University of British Columbia, Gordon & Leslie Diamond Health Care Centre, 2775 Laurel St, 9th Floor, Vancouver, BC, Canada V5Z 1M9
| | - Ken Gin
- Division of Cardiology, University of British Columbia, Gordon & Leslie Diamond Health Care Centre, 2775 Laurel St, 9th Floor, Vancouver, BC, Canada V5Z 1M9
| | - Parvathy Nair
- Division of Cardiology, University of British Columbia, Gordon & Leslie Diamond Health Care Centre, 2775 Laurel St, 9th Floor, Vancouver, BC, Canada V5Z 1M9
| | - Purang Abolmaesumi
- Department of Electrical and Computer Engineering, University of British Columbia, 5500-2332 Main Mall, Vancouver, BC, Canada V6T 1Z4
| | - Teresa S M Tsang
- Division of Cardiology, University of British Columbia, Gordon & Leslie Diamond Health Care Centre, 2775 Laurel St, 9th Floor, Vancouver, BC, Canada V5Z 1M9.
| |
Collapse
|
11
|
Hagberg E, Hagerman D, Johansson R, Hosseini N, Liu J, Björnsson E, Alvén J, Hjelmgren O. Semi-supervised learning with natural language processing for right ventricle classification in echocardiography-a scalable approach. Comput Biol Med 2022; 143:105282. [PMID: 35220074 DOI: 10.1016/j.compbiomed.2022.105282] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 01/30/2022] [Accepted: 01/30/2022] [Indexed: 11/20/2022]
Abstract
We created a deep learning model, trained on text classified by natural language processing (NLP), to assess right ventricular (RV) size and function from echocardiographic images. We included 12,684 examinations with corresponding written reports for text classification. After manual annotation of 1489 reports, we trained an NLP model to classify the remaining 10,651 reports. A view classifier was developed to select the 4-chamber or RV-focused view from an echocardiographic examination (n = 539). The final models were two image classification models trained on the predicted labels from the combined manual annotation and NLP models and the corresponding echocardiographic view to assess RV function (training set n = 11,008) and size (training set n = 9951. The text classifier identified impaired RV function with 99% sensitivity and 98% specificity and RV enlargement with 98% sensitivity and 98% specificity. The view classification model identified the 4-chamber view with 92% accuracy and the RV-focused view with 73% accuracy. The image classification models identified impaired RV function with 93% sensitivity and 72% specificity and an enlarged RV with 80% sensitivity and 85% specificity; agreement with the written reports was substantial (both κ = 0.65). Our findings show that models for automatic image assessment can be trained to classify RV size and function by using model-annotated data from written echocardiography reports. This pipeline for auto-annotation of the echocardiographic images, using a NLP model with medical reports as input, can be used to train an image-assessment model without manual annotation of images and enables fast and inexpensive expansion of the training dataset when needed.
Collapse
Affiliation(s)
- Eva Hagberg
- Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden; Region Västra Götaland, Sahlgrenska University Hospital, Department of Clinical Physiology, Gothenburg, Sweden.
| | - David Hagerman
- Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden; Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Richard Johansson
- Department of Computer Science and Engineering, University of Gothenburg, Gothenburg, Sweden
| | - Nasser Hosseini
- Department of Medical Physics and Biomedical Engineering, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Jan Liu
- Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Elin Björnsson
- Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Jennifer Alvén
- Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden; Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Ola Hjelmgren
- Department of Molecular and Clinical Medicine, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden; Region Västra Götaland, Sahlgrenska University Hospital, Department of Clinical Physiology, Gothenburg, Sweden
| |
Collapse
|
12
|
Tseng AS, Lopez-Jimenez F, Pellikka PA. Future Guidelines for Artificial Intelligence in Echocardiography. J Am Soc Echocardiogr 2022; 35:878-882. [DOI: 10.1016/j.echo.2022.04.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/14/2022] [Accepted: 04/16/2022] [Indexed: 11/28/2022]
|
13
|
Deady M, Ezzeldin H, Cook K, Billings D, Pizarro J, Plotogea AA, Saunders-Hastings P, Belov A, Whitaker BI, Anderson SA. The Food and Drug Administration Biologics Effectiveness and Safety Initiative Facilitates Detection of Vaccine Administrations From Unstructured Data in Medical Records Through Natural Language Processing. Front Digit Health 2022; 3:777905. [PMID: 35005697 PMCID: PMC8727347 DOI: 10.3389/fdgth.2021.777905] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 12/03/2021] [Indexed: 12/03/2022] Open
Abstract
Introduction: The Food and Drug Administration Center for Biologics Evaluation and Research conducts post-market surveillance of biologic products to ensure their safety and effectiveness. Studies have found that common vaccine exposures may be missing from structured data elements of electronic health records (EHRs), instead being captured in clinical notes. This impacts monitoring of adverse events following immunizations (AEFIs). For example, COVID-19 vaccines have been regularly administered outside of traditional medical settings. We developed a natural language processing (NLP) algorithm to mine unstructured clinical notes for vaccinations not captured in structured EHR data. Methods: A random sample of 1,000 influenza vaccine administrations, representing 995 unique patients, was extracted from a large U.S. EHR database. NLP techniques were used to detect administrations from the clinical notes in the training dataset [80% (N = 797) of patients]. The algorithm was applied to the validation dataset [20% (N = 198) of patients] to assess performance. Full medical charts for 28 randomly selected administration events in the validation dataset were reviewed by clinicians. The NLP algorithm was then applied across the entire dataset (N = 995) to quantify the number of additional events identified. Results: A total of 3,199 administrations were identified in the structured data and clinical notes combined. Of these, 2,740 (85.7%) were identified in the structured data, while the NLP algorithm identified 1,183 (37.0%) administrations in clinical notes; 459 were not also captured in the structured data. This represents a 16.8% increase in the identification of vaccine administrations compared to using structured data alone. The validation of 28 vaccine administrations confirmed 27 (96.4%) as “definite” vaccine administrations; 18 (64.3%) had evidence of a vaccination event in the structured data, while 10 (35.7%) were found solely in the unstructured notes. Discussion: We demonstrated the utility of an NLP algorithm to identify vaccine administrations not captured in structured EHR data. NLP techniques have the potential to improve detection of vaccine administrations not otherwise reported without increasing the analysis burden on physicians or practitioners. Future applications could include refining estimates of vaccine coverage and detecting other exposures, population characteristics, and outcomes not reliably captured in structured EHR data.
Collapse
Affiliation(s)
| | - Hussein Ezzeldin
- US Food and Drug Administration, Silver Spring, MD, United States
| | | | | | | | | | | | - Artur Belov
- US Food and Drug Administration, Silver Spring, MD, United States
| | | | | |
Collapse
|
14
|
Taylor AM. The role of artificial intelligence in paediatric cardiovascular magnetic resonance imaging. Pediatr Radiol 2022; 52:2131-2138. [PMID: 34936019 PMCID: PMC9537201 DOI: 10.1007/s00247-021-05218-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/13/2021] [Accepted: 10/05/2021] [Indexed: 11/24/2022]
Abstract
Artificial intelligence (AI) offers the potential to change many aspects of paediatric cardiac imaging. At present, there are only a few clinically validated examples of AI applications in this field. This review focuses on the use of AI in paediatric cardiovascular MRI, using examples from paediatric cardiovascular MRI, adult cardiovascular MRI and other radiologic experience.
Collapse
Affiliation(s)
- Andrew M. Taylor
- Great Ormond Street Hospital for Children, Zayed Centre for Research, 20 Guildford St., Room 3.7, London, WC1N 1DZ UK ,Cardiovascular Imaging, UCL Institute of Cardiovascular Science, London, UK
| |
Collapse
|
15
|
Richter-Pechanski P, Geis NA, Kiriakou C, Schwab DM, Dieterich C. Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models. Digit Health 2021; 7:20552076211057662. [PMID: 34868618 PMCID: PMC8637713 DOI: 10.1177/20552076211057662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 10/15/2021] [Indexed: 11/17/2022] Open
Abstract
Objective A vast amount of medical data is still stored in unstructured text documents.
We present an automated method of information extraction from German
unstructured clinical routine data from the cardiology domain enabling their
usage in state-of-the-art data-driven deep learning projects. Methods We evaluated pre-trained language models to extract a set of 12
cardiovascular concepts in German discharge letters. We compared three
bidirectional encoder representations from transformers pre-trained on
different corpora and fine-tuned them on the task of cardiovascular concept
extraction using 204 discharge letters manually annotated by cardiologists
at the University Hospital Heidelberg. We compared our results with
traditional machine learning methods based on a long short-term memory
network and a conditional random field. Results Our best performing model, based on publicly available German pre-trained
bidirectional encoder representations from the transformer model, achieved a
token-wise micro-average F1-score of 86% and outperformed the baseline by at
least 6%. Moreover, this approach achieved the best trade-off between
precision (positive predictive value) and recall (sensitivity). Conclusion Our results show the applicability of state-of-the-art deep learning methods
using pre-trained language models for the task of cardiovascular concept
extraction using limited training data. This minimizes annotation efforts,
which are currently the bottleneck of any application of data-driven deep
learning projects in the clinical domain for German and many other European
languages.
Collapse
Affiliation(s)
- Phillip Richter-Pechanski
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg, Germany.,Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.,German Center for Cardiovascular Research (DZHK) - Partner Site Heidelberg/Mannheim, Mannheim, Germany.,Informatics for Life, Heidelberg, Germany
| | - Nicolas A Geis
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.,Informatics for Life, Heidelberg, Germany
| | - Christina Kiriakou
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany
| | - Dominic M Schwab
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany
| | - Christoph Dieterich
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg, Germany.,Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.,German Center for Cardiovascular Research (DZHK) - Partner Site Heidelberg/Mannheim, Mannheim, Germany.,Informatics for Life, Heidelberg, Germany
| |
Collapse
|
16
|
Noyd DH, Berkman A, Howell C, Power S, Kreissman SG, Landstrom AP, Khouri M, Oeffinger KC, Kibbe WA. Leveraging Clinical Informatics Tools to Extract Cumulative Anthracycline Exposure, Measure Cardiovascular Outcomes, and Assess Guideline Adherence for Children With Cancer. JCO Clin Cancer Inform 2021; 5:1062-1075. [PMID: 34714665 PMCID: PMC9848538 DOI: 10.1200/cci.21.00099] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
PURPOSE Cardiovascular disease is a significant cause of late morbidity and mortality in survivors of childhood cancer. Clinical informatics tools could enhance provider adherence to echocardiogram guidelines for early detection of late-onset cardiomyopathy. METHODS Cancer registry data were linked to electronic health record data. Structured query language facilitated the construction of anthracycline-exposed cohorts at a single institution. Primary outcomes included the data quality from automatic anthracycline extraction, sensitivity of International Classification of Disease coding for heart failure, and adherence to echocardiogram guideline recommendations. RESULTS The final analytic cohort included 385 pediatric oncology patients diagnosed between July 1, 2013, and December 31, 2018, among whom 194 were classified as no anthracycline exposure, 143 had low anthracycline exposure (< 250 mg/m2), and 48 had high anthracycline exposure (≥ 250 mg/m2). Manual review of anthracycline exposure was highly concordant (95%) with the automatic extraction. Among the unexposed group, 15% had an anthracycline administered at an outside institution not captured by standard query language coding. Manual review of echocardiogram parameters and clinic notes yielded a sensitivity of 75%, specificity of 98%, and positive predictive value of 68% for International Classification of Disease coding of heart failure. For patients with anthracycline exposure, 78.5% (n = 62) were adherent to guideline recommendations for echocardiogram surveillance. There were significant association with provider adherence and race and ethnicity (P = .047), and 50% of patients with Spanish as their primary language were adherent compared with 90% of patients with English as their primary language (P = .003). CONCLUSION Extraction of treatment exposures from the electronic health record through clinical informatics and integration with cancer registry data represents a feasible approach to assess cardiovascular disease outcomes and adherence to guideline recommendations for survivors.
Collapse
Affiliation(s)
- David H. Noyd
- Department of Pediatrics, The University
of Oklahoma Health Sciences Center, Oklahoma City, OK,Department of Pediatrics, Duke University
Medical Center, Durham, NC,David H. Noyd, MD, MPH, 1200 Children's Ave, A2-14702,
Oklahoma City, OK 73104; e-mail:
| | - Amy Berkman
- Department of Pediatrics, Duke University
Medical Center, Durham, NC
| | | | | | - Susan G. Kreissman
- Department of Pediatrics, The University
of Oklahoma Health Sciences Center, Oklahoma City, OK
| | - Andrew P. Landstrom
- Division of Cardiology and Department of
Cell Biology, Department of Pediatrics, Duke University Medical Center, Durham,
NC
| | - Michel Khouri
- Department of Medicine, Duke University
Medical Center, Durham, NC
| | - Kevin C. Oeffinger
- Duke Cancer Institute, Durham, NC,Department of Medicine, Duke University
Medical Center, Durham, NC
| | - Warren A. Kibbe
- Duke Cancer Institute, Durham, NC,Department of Biostatistics and
Bioinformatics, Duke University, Durham, NC
| |
Collapse
|
17
|
Reading Turchioe M, Volodarskiy A, Pathak J, Wright DN, Tcheng JE, Slotwiner D. Systematic review of current natural language processing methods and applications in cardiology. Heart 2021; 108:909-916. [PMID: 34711662 DOI: 10.1136/heartjnl-2021-319769] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 09/29/2021] [Indexed: 01/16/2023] Open
Abstract
Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015-2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.
Collapse
Affiliation(s)
- Meghan Reading Turchioe
- Department of Population Health Sciences, Division of Health Informatics, Weill Cornell Medicine, New York, New York, USA
| | - Alexander Volodarskiy
- Department of Medicine, Division of Cardiology, NewYork-Presbyterian Hospital, New York, New York, USA
| | - Jyotishman Pathak
- Department of Population Health Sciences, Division of Health Informatics, Weill Cornell Medicine, New York, New York, USA
| | - Drew N Wright
- Samuel J. Wood Library & C.V. Starr Biomedical Information Center, Weill Cornell Medical College, New York, New York, USA
| | - James Enlou Tcheng
- Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
| | - David Slotwiner
- Department of Population Health Sciences, Division of Health Informatics, Weill Cornell Medicine, New York, New York, USA.,Department of Medicine, Division of Cardiology, NewYork-Presbyterian Hospital, New York, New York, USA
| |
Collapse
|
18
|
Epstein RH, Jean YK, Dudaryk R, Freundlich RE, Walco JP, Mueller DA, Banks SE. Natural Language Mapping of Electrocardiogram Interpretations to a Standardized Ontology. Methods Inf Med 2021; 60:104-109. [PMID: 34610644 PMCID: PMC8595771 DOI: 10.1055/s-0041-1736312] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
BACKGROUND Interpretations of the electrocardiogram (ECG) are often prepared using software outside the electronic health record (EHR) and imported via an interface as a narrative note. Thus, natural language processing is required to create a computable representation of the findings. Challenges include misspellings, nonstandard abbreviations, jargon, and equivocation in diagnostic interpretations. OBJECTIVES Our objective was to develop an algorithm to reliably and efficiently extract such information and map it to the standardized ECG ontology developed jointly by the American Heart Association, the American College of Cardiology Foundation, and the Heart Rhythm Society. The algorithm was to be designed to be easily modifiable for use with EHRs and ECG reporting systems other than the ones studied. METHODS An algorithm using natural language processing techniques was developed in structured query language to extract and map quantitative and diagnostic information from ECG narrative reports to the cardiology societies' standardized ECG ontology. The algorithm was developed using a training dataset of 43,861 ECG reports and applied to a test dataset of 46,873 reports. RESULTS Accuracy, precision, recall, and the F1-measure were all 100% in the test dataset for the extraction of quantitative data (e.g., PR and QTc interval, atrial and ventricular heart rate). Performances for matches in each diagnostic category in the standardized ECG ontology were all above 99% in the test dataset. The processing speed was approximately 20,000 reports per minute. We externally validated the algorithm from another institution that used a different ECG reporting system and found similar performance. CONCLUSION The developed algorithm had high performance for creating a computable representation of ECG interpretations. Software and lookup tables are provided that can easily be modified for local customization and for use with other EHR and ECG reporting systems. This algorithm has utility for research and in clinical decision-support where incorporation of ECG findings is desired.
Collapse
Affiliation(s)
- Richard H. Epstein
- Department of Anesthesiology, Perioperative Medicine and Pain Management, University of Miami Miller School of Medicine, Miami, Florida, United States
| | - Yuel-Kai Jean
- Department of Anesthesiology, Perioperative Medicine and Pain Management, University of Miami Miller School of Medicine, Miami, Florida, United States
| | - Roman Dudaryk
- Department of Anesthesiology, Perioperative Medicine and Pain Management, University of Miami Miller School of Medicine, Miami, Florida, United States
| | - Robert E. Freundlich
- Anesthesiology Critical Care Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Jeremy P. Walco
- Anesthesiology Critical Care Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Dorothee A. Mueller
- Anesthesiology Critical Care Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Shawn E. Banks
- Department of Anesthesiology, Perioperative Medicine and Pain Management, University of Miami Miller School of Medicine, Miami, Florida, United States
| |
Collapse
|
19
|
Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records. CARDIOVASCULAR DIGITAL HEALTH JOURNAL 2021; 2:156-163. [PMID: 35265904 PMCID: PMC8890044 DOI: 10.1016/j.cvdhj.2021.03.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Background Objective Methods Results Conclusion
Collapse
|
20
|
Caufield JH, Sigdel D, Fu J, Choi H, Guevara-Gonzalez V, Wang D, Ping P. Cardiovascular Informatics: building a bridge to data harmony. Cardiovasc Res 2021; 118:732-745. [PMID: 33751044 DOI: 10.1093/cvr/cvab067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 03/03/2021] [Indexed: 12/11/2022] Open
Abstract
The search for new strategies for better understanding cardiovascular disease is a constant one, spanning multitudinous types of observations and studies. A comprehensive characterization of each disease state and its biomolecular underpinnings relies upon insights gleaned from extensive information collection of various types of data. Researchers and clinicians in cardiovascular biomedicine repeatedly face questions regarding which types of data may best answer their questions, how to integrate information from multiple datasets of various types, and how to adapt emerging advances in machine learning and/or artificial intelligence to their needs in data processing. Frequently lauded as a field with great practical and translational potential, the interface between biomedical informatics and cardiovascular medicine is challenged with staggeringly massive datasets. Successful application of computational approaches to decode these complex and gigantic amounts of information becomes an essential step toward realizing the desired benefits. In this review, we examine recent efforts to adapt informatics strategies to cardiovascular biomedical research: automated information extraction and unification of multifaceted -omics data. We discuss how and why this interdisciplinary space of Cardiovascular Informatics is particularly relevant to and supportive of current experimental and clinical research. We describe in detail how open data sources and methods can drive discovery while demanding few initial resources, an advantage afforded by widespread availability of cloud computing-driven platforms. Subsequently, we provide examples of how interoperable computational systems facilitate exploration of data from multiple sources, including both consistently-formatted structured data and unstructured data. Taken together, these approaches for achieving data harmony enable molecular phenotyping of cardiovascular (CV) diseases and unification of cardiovascular knowledge.
Collapse
Affiliation(s)
- J Harry Caufield
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA.,Departments of Physiology at UCLA School of Medicine, Los Angeles, CA, 90095, USA
| | - Dibakar Sigdel
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA.,Departments of Physiology at UCLA School of Medicine, Los Angeles, CA, 90095, USA
| | - John Fu
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA
| | - Howard Choi
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA
| | - Vladimir Guevara-Gonzalez
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA
| | - Ding Wang
- Departments of Physiology at UCLA School of Medicine, Los Angeles, CA, 90095, USA
| | - Peipei Ping
- NHLBI Integrated Cardiovascular Data Science Training Program at University of California, Los Angeles (UCLA), Los Angeles, CA, 90095, USA.,Departments of Physiology at UCLA School of Medicine, Los Angeles, CA, 90095, USA.,Department of Medicine (Cardiology) at UCLA School of Medicine, Los Angeles, CA, 90095, USA.,Bioinformatics and Medical Informatics, Los Angeles, CA, 90095, USA.,Scalable Analytics Institute (ScAi) at UCLA School of Engineering, Los Angeles, CA, 90095, USA
| |
Collapse
|
21
|
Laique SN, Hayat U, Sarvepalli S, Vaughn B, Ibrahim M, McMichael J, Qaiser KN, Burke C, Bhatt A, Rhodes C, Rizk MK. Application of optical character recognition with natural language processing for large-scale quality metric data extraction in colonoscopy reports. Gastrointest Endosc 2021; 93:750-757. [PMID: 32891620 PMCID: PMC8794764 DOI: 10.1016/j.gie.2020.08.038] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 08/27/2020] [Indexed: 12/11/2022]
Abstract
BACKGROUND AND AIMS Colonoscopy is commonly performed for colorectal cancer screening in the United States. Reports are often generated in a non-standardized format and are not always integrated into electronic health records. Thus, this information is not readily available for streamlining quality management, participating in endoscopy registries, or reporting of patient- and center-specific risk factors predictive of outcomes. We aim to demonstrate the use of a new hybrid approach using natural language processing of charts that have been elucidated with optical character recognition processing (OCR/NLP hybrid) to obtain relevant clinical information from scanned colonoscopy and pathology reports, a technology co-developed by Cleveland Clinic and eHealth Technologies (West Henrietta, NY, USA). METHODS This was a retrospective study conducted at Cleveland Clinic, Cleveland, Ohio, and the University of Minnesota, Minneapolis, Minnesota. A randomly sampled list of outpatient screening colonoscopy procedures and pathology reports was selected. Desired variables were then collected. Two researchers first manually reviewed the reports for the desired variables. Then, the OCR/NLP algorithm was used to obtain the same variables from 3 electronic health records in use at our institution: Epic (Verona, Wisc, USA), ProVation (Minneapolis, Minn, USA) used for endoscopy reporting, and Sunquest PowerPath (Tucson, Ariz, USA) used for pathology reporting. RESULTS Compared with manual data extraction, the accuracy of the hybrid OCR/NLP approach to detect polyps was 95.8%, adenomas 98.5%, sessile serrated polyps 99.3%, advanced adenomas 98%, inadequate bowel preparation 98.4%, and failed cecal intubation 99%. Comparison of the dataset collected via NLP alone with that collected using the hybrid OCR/NLP approach showed that the accuracy for almost all variables was >99%. CONCLUSIONS Our study is the first to validate the use of a unique hybrid OCR/NLP technology to extract desired variables from scanned procedure and pathology reports contained in image format with an accuracy >95%.
Collapse
Affiliation(s)
- Sobia Nasir Laique
- Division of Gastroenterology and Hepatology, Mayo Clinic, Phoenix, Arizona
| | - Umar Hayat
- Division of Gastroenterology, University of Minnesota, Minneapolis, Minnesota
| | - Shashank Sarvepalli
- Department of Hospital Medicine, Cleveland Clinic, Cleveland, Ohio,Department of Bioinformatics, Vanderbilt University, Nashville, Tennessee
| | - Byron Vaughn
- Division of Gastroenterology, University of Minnesota, Minneapolis, Minnesota
| | - Mounir Ibrahim
- Digestive Disease Institute, Cleveland Clinic, Cleveland, Ohio
| | - John McMichael
- Digestive Disease Institute, Cleveland Clinic, Cleveland, Ohio
| | | | - Carol Burke
- Digestive Disease Institute, Cleveland Clinic, Cleveland, Ohio
| | - Amit Bhatt
- Digestive Disease Institute, Cleveland Clinic, Cleveland, Ohio
| | - Colin Rhodes
- eHealth Technology, West Henrietta, New York, New York, USA
| | - Maged K. Rizk
- Digestive Disease Institute, Cleveland Clinic, Cleveland, Ohio
| |
Collapse
|
22
|
Lee S, Li B, Martin EA, D'Souza AG, Jiang J, Doktorchik C, Southern DA, Lee J, Wiebe N, Quan H, Eastwood CA. CREATE: A New Data Resource to Support Cardiac Precision Health. CJC Open 2020; 3:639-645. [PMID: 34036259 PMCID: PMC8134941 DOI: 10.1016/j.cjco.2020.12.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 12/08/2020] [Indexed: 11/27/2022] Open
Abstract
Background The initiatives of precision medicine and learning health systems require databases with rich and accurately captured data on patient characteristics. We introduce the Clinical Registry, AdminisTrative Data and Electronic Medical Records (CREATE) database, which includes linked data from 4 population databases: Alberta Provincial Project for Outcome Assessment in Coronary Heart Disease (APPROACH; a national clinical registry), Sunrise Clinical Manager (SCM) electronic medical record (city-wide), the Discharge Abstract Database (DAD), and the National Ambulatory Care Reporting System (NACRS). The intent of this work is to introduce a cardiovascular-specific database for pursuing precision health activities using big data analytics. Methods We used deterministic data linkage to link SCM electronic medical record data to APPROACH clinical registry data using patient identifier variables. The APPROACH-SCM data set was subsequently linked to DAD and NACRS to obtain inpatient and outpatient cohort data. We further validated the quality of the linkage, where applicable, in these databases by comparing against the Alberta Health Insurance Care Plan registry database. Results We achieved 99.96% linkage across these 4 databases. Currently, there are 30,984 patients with 35,753 catheterizations in the CREATE database. The inpatient cohort contained 65.75% (20,373/30,984) of the patient sample, whereas the outpatient cohort contained 29.78% (9226/30,984). The infrastructure and the process to update and expand the database has been established. Conclusions CREATE is intended to serve as a database for supporting big data analytics activities surrounding cardiac precision health. The CREATE database will be managed by the Centre for Health Informatics at the University of Calgary, and housed in a secure high-performance computing environment.
Collapse
Affiliation(s)
- Seungwon Lee
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada.,Alberta Health Services, Calgary, Alberta, Canada.,Data Intelligence for Health Lab, University of Calgary, Calgary, Alberta, Canada
| | - Bing Li
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Alberta Health Services, Calgary, Alberta, Canada
| | - Elliot A Martin
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Alberta Health Services, Calgary, Alberta, Canada
| | - Adam G D'Souza
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Alberta Health Services, Calgary, Alberta, Canada
| | - Jason Jiang
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Alberta Health Services, Calgary, Alberta, Canada
| | - Chelsea Doktorchik
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada
| | - Danielle A Southern
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada
| | - Joon Lee
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada.,Data Intelligence for Health Lab, University of Calgary, Calgary, Alberta, Canada.,Department of Cardiac Sciences, University of Calgary, Calgary, Alberta, Canada
| | - Natalie Wiebe
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada
| | - Hude Quan
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada
| | - Cathy A Eastwood
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada.,Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
23
|
Gaffar S, Gearhart AS, Chang AC. The Next Frontier in Pediatric Cardiology: Artificial Intelligence. Pediatr Clin North Am 2020; 67:995-1009. [PMID: 32888694 DOI: 10.1016/j.pcl.2020.06.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Artificial intelligence (AI) in the last decade centered primarily around digitizing and incorporating the large volumes of patient data from electronic health records. AI is now poised to make the next step in health care integration, with precision medicine, imaging support, and development of individual health trends with the popularization of wearable devices. Future clinical pediatric cardiologists will use AI as an adjunct in delivering optimum patient care, with the help of accurate predictive risk calculators, continual health monitoring from wearables, and precision medicine. Physicians must also protect their patients' health information from monetization or exploitation.
Collapse
Affiliation(s)
- Sharib Gaffar
- UC Irvine Pediatrics Residency Program, Choc Children's Hospital of Orange County, 757 Westwood Plaza, Ste 5235, Los Angeles, CA 90095-8358, USA
| | - Addison S Gearhart
- Boston Children's Hospital Heart Center, 300 Longwood Avenue, Boston, MA 02115, USA
| | - Anthony C Chang
- The Sharon Disney Lund Medical Intelligence and Innovation Institute (MI3), Children's Hospital of Orange County, 1120 W La Veta Ave, STE 860, Orange, CA 92868, USA.
| |
Collapse
|
24
|
Tisdale RL, Haddad F, Kohsaka S, Heidenreich PA. Trends in Left Ventricular Ejection Fraction for Patients With a New Diagnosis of Heart Failure. Circ Heart Fail 2020; 13:e006743. [PMID: 32867526 DOI: 10.1161/circheartfailure.119.006743] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND The left ventricular ejection fraction (LVEF) guides treatment of heart failure, yet this data has not been systematically collected in large data sets. We sought to characterize the epidemiology of incident heart failure using the initial LVEF. METHODS We identified 219 537 patients in the Veterans Affairs system between 2011 and 2017 who had an LVEF documented within 365 days before and 30 days after the heart failure diagnosis date. LVEF was obtained from natural language processing from imaging and provider notes. In multivariate analysis, we assessed characteristics associated with having an initial LVEF <40%. RESULTS Most patients were male and White; a plurality were within the 60 to 69 year age decile. A majority of patients had ischemic heart disease and a high burden of co-morbidities. Over time, presentation with an LVEF <40% became slightly less common, with a nadir in 2015. Presentation with an initial LVEF <40% was more common in younger patients, men, Black and Hispanic patients, an inpatient presentation, lower systolic blood pressure, lower pulse pressure, and higher heart rate. Ischemic heart disease, alcohol use disorder, peripheral arterial disease, and ventricular arrhythmias were associated with an initial LVEF <40%, while most other comorbid conditions (eg, atrial fibrillation, chronic obstructive pulmonary disease, malignancy) were more strongly associated with an initial LVEF >50%. CONCLUSIONS For patients with heart failure, particularly at the extremes of age, an initial preserved LVEF is common. In addition to clinical characteristics, certain races (Black and Hispanic) were more likely to present with a reduced LVEF. Further studies are needed to determine if racial differences are due to patient or health systems issues such as access to care.
Collapse
Affiliation(s)
- Rebecca L Tisdale
- Department of Medicine, Stanford University School of Medicine, CA (R.L.T., F.H., P.A.H.).,Veterans Affairs Palo Alto Health Care System, Stanford, CA (R.L.T., P.A.H.)
| | - François Haddad
- Department of Medicine, Stanford University School of Medicine, CA (R.L.T., F.H., P.A.H.)
| | - Shun Kohsaka
- Keio University School of Medicine, Tokyo, Japan (S.K.)
| | - Paul A Heidenreich
- Department of Medicine, Stanford University School of Medicine, CA (R.L.T., F.H., P.A.H.).,Veterans Affairs Palo Alto Health Care System, Stanford, CA (R.L.T., P.A.H.)
| |
Collapse
|
25
|
Abstract
The combination of pediatric cardiology being both a perceptual and a cognitive subspecialty demands a complex decision-making model which makes artificial intelligence a particularly attractive technology with great potential. The prototypical artificial intelligence system would autonomously impute patient data into a collaborative database that stores, syncs, interprets and ultimately classifies the patient's profile to specific disease phenotypes to compare against a large aggregate of shared peer health data and outcomes, the current medical body of literature and ongoing trials to offer morbidity and mortality prediction, drug therapy options targeted to each patient's genetic profile, tailored surgical plans and recommendations for timing of sequential imaging. The focus of this review paper is to offer a primer on artificial intelligence and paediatric cardiology by briefly discussing the history of artificial intelligence in medicine, modern and future applications in adult and paediatric cardiology across selected concentrations, and current barriers to implementation of these technologies.
Collapse
|
26
|
Adekkanattu P, Jiang G, Luo Y, Kingsbury PR, Xu Z, Rasmussen LV, Pacheco JA, Kiefer RC, Stone DJ, Brandt PS, Yao L, Zhong Y, Deng Y, Wang F, Ancker JS, Campion TR, Pathak J. Evaluating the Portability of an NLP System for Processing Echocardiograms: A Retrospective, Multi-site Observational Study. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:190-199. [PMID: 32308812 PMCID: PMC7153064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
While natural language processing (NLP) of unstructured clinical narratives holds the potential for patient care and clinical research, portability of NLP approaches across multiple sites remains a major challenge. This study investigated the portability of an NLP system developed initially at the Department of Veterans Affairs (VA) to extract 27 key cardiac concepts from free-text or semi-structured echocardiograms from three academic edical centers: Weill Cornell Medicine, Mayo Clinic and Northwestern Medicine. While the NLP system showed high precision and recall easurements for four target concepts (aortic valve regurgitation, left atrium size at end systole, mitral valve regurgitation, tricuspid valve regurgitation) across all sites, we found moderate or poor results for the remaining concepts and the NLP system performance varied between individual sites.
Collapse
Affiliation(s)
| | | | - Yuan Luo
- Northwestern University, Chicago, IL
| | | | | | | | | | | | | | | | - Liang Yao
- Northwestern University, Chicago, IL
| | | | - Yu Deng
- Northwestern University, Chicago, IL
| | - Fei Wang
- Weill Cornell Medicine, New York, NY
| | | | | | | |
Collapse
|
27
|
Cai T, Zhang L, Yang N, Kumamaru KK, Rybicki FJ, Cai T, Liao KP. EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research. BMC Med Inform Decis Mak 2019; 19:226. [PMID: 31730484 PMCID: PMC6858776 DOI: 10.1186/s12911-019-0970-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Accepted: 11/06/2019] [Indexed: 11/12/2022] Open
Abstract
Background Electronic medical records (EMR) contain numerical data important for clinical outcomes research, such as vital signs and cardiac ejection fractions (EF), which tend to be embedded in narrative clinical notes. In current practice, this data is often manually extracted for use in research studies. However, due to the large volume of notes in datasets, manually extracting numerical data often becomes infeasible. The objective of this study is to develop and validate a natural language processing (NLP) tool that can efficiently extract numerical clinical data from narrative notes. Results To validate the accuracy of the tool EXTraction of EMR Numerical Data (EXTEND), we developed a reference standard by manually extracting vital signs from 285 notes, EF values from 300 notes, glycated hemoglobin (HbA1C), and serum creatinine from 890 notes. For each parameter of interest, we calculated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score of EXTEND using two metrics. (1) completion of data extraction, and (2) accuracy of data extraction compared to the actual values in the note verified by chart review. At the note level, extraction by EXTEND was considered correct only if it accurately detected and extracted all values of interest in a note. Using manually-annotated labels as the gold standard, the note-level accuracy of EXTEND in capturing the numerical vital sign values, EF, HbA1C and creatinine ranged from 0.88 to 0.95 for sensitivity, 0.95 to 1.0 for specificity, 0.95 to 1.0 for PPV, 0.89 to 0.99 for NPV, and 0.92 to 0.96 in F1 scores. Compared to the actual value level, the sensitivity, PPV, and F1 score of EXTEND ranged from 0.91 to 0.95, 0.95 to 1.0 and 0.95 to 0.96. Conclusions EXTEND is an efficient, flexible tool that uses knowledge-based rules to extract clinical numerical parameters with high accuracy. By increasing dictionary terms and developing new rules, the usage of EXTEND can easily be expanded to extract additional numerical data important in clinical outcomes research.
Collapse
Affiliation(s)
- Tianrun Cai
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, 6016BB, 60 Fenwood Road, Boston, 02115, USA. .,Harvard Medical School, Boston, MA, USA.
| | | | - Nicole Yang
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, 6016BB, 60 Fenwood Road, Boston, 02115, USA
| | - Kanako K Kumamaru
- Department of Radiology, School of Medicine, Juntendo University, Tokyo, Japan
| | - Frank J Rybicki
- Department of Radiology, University of Ottawa, Ottawa, Canada
| | - Tianxi Cai
- Harvard Medical School, Boston, MA, USA.,Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Katherine P Liao
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, 6016BB, 60 Fenwood Road, Boston, 02115, USA.,Harvard Medical School, Boston, MA, USA.,VA Boston Healthcare System, Boston, MA, USA
| |
Collapse
|
28
|
Becker A. Artificial intelligence in medicine: What is it doing for us today? HEALTH POLICY AND TECHNOLOGY 2019. [DOI: 10.1016/j.hlpt.2019.03.004] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
29
|
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019; 7:e12239. [PMID: 31066697 PMCID: PMC6528438 DOI: 10.2196/12239] [Citation(s) in RCA: 230] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset. OBJECTIVE The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives. METHODS Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles. RESULTS Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. CONCLUSIONS Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
Collapse
Affiliation(s)
- Seyedmostafa Sheikhalishahi
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
- Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
| | - Riccardo Miotto
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Alberto Lavelli
- NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| | - Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Venet Osmani
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| |
Collapse
|
30
|
Chang AC. Artificial intelligence in pediatric cardiology and cardiac surgery: Irrational hype or paradigm shift? Ann Pediatr Cardiol 2019; 12:191-194. [PMID: 31516273 PMCID: PMC6716326 DOI: 10.4103/apc.apc_55_19] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Affiliation(s)
- Anthony C Chang
- Medical Director, The Sharon Disney Lund Medical Intelligence and Innovation Institute (MI3), Children's Hospital of Orange County, Founder AIMed, Orange, CA, USA
| |
Collapse
|
31
|
Wagholikar KB, Fischer CM, Goodson A, Herrick CD, Rees M, Toscano E, MacRae CA, Scirica BM, Desai AS, Murphy SN. Extraction of Ejection Fraction from Echocardiography Notes for Constructing a Cohort of Patients having Heart Failure with reduced Ejection Fraction (HFrEF). J Med Syst 2018; 42:209. [PMID: 30255347 PMCID: PMC6153777 DOI: 10.1007/s10916-018-1066-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2018] [Accepted: 09/09/2018] [Indexed: 12/19/2022]
Abstract
Left ventricular ejection fraction (LVEF) is an important prognostic indicator of cardiovascular outcomes. It is used clinically to determine the indication for several therapeutic interventions. LVEF is most commonly derived using in-line tools and some manual assessment by cardiologists from standardized echocardiographic views. LVEF is typically documented in free-text reports, and variation in LVEF documentation pose a challenge for the extraction and utilization of LVEF in computer-based clinical workflows. To address this problem, we developed a computerized algorithm to extract LVEF from echocardiography reports for the identification of patients having heart failure with reduced ejection fraction (HFrEF) for therapeutic intervention at a large healthcare system. We processed echocardiogram reports for 57,158 patients with coded diagnosis of Heart Failure that visited the healthcare system over a two-year period. Our algorithm identified a total of 3910 patients with reduced ejection fraction. Of the 46,634 echocardiography reports processed, 97% included a mention of LVEF. Of these reports, 85% contained numerical ejection fraction values, 9% contained ranges, and the remaining 6% contained qualitative descriptions. Overall, 18% of extracted numerical LVEFs were ≤ 40%. Furthermore, manual validation for a sample of 339 reports yielded an accuracy of 1.0. Our study demonstrates that a regular expression-based approach can accurately extract LVEF from echocardiograms, and is useful for delineating heart-failure patients with reduced ejection fraction.
Collapse
Affiliation(s)
- Kavishwar B Wagholikar
- Harvard Medical School, Boston, MA, USA. .,Massachusetts General Hospital, Boston, MA, USA.
| | | | | | | | | | | | - Calum A MacRae
- Harvard Medical School, Boston, MA, USA.,Brigham Women's Hospital, Boston, MA, USA
| | - Benjamin M Scirica
- Harvard Medical School, Boston, MA, USA.,Brigham Women's Hospital, Boston, MA, USA
| | - Akshay S Desai
- Harvard Medical School, Boston, MA, USA.,Brigham Women's Hospital, Boston, MA, USA
| | - Shawn N Murphy
- Harvard Medical School, Boston, MA, USA.,Massachusetts General Hospital, Boston, MA, USA.,Partners Healthcare, Boston, MA, USA
| |
Collapse
|
32
|
Johnson SB, Adekkanattu P, Campion TR, Flory J, Pathak J, Patterson OV, DuVall SL, Major V, Aphinyanaphongs Y. From Sour Grapes to Low-Hanging Fruit: A Case Study Demonstrating a Practical Strategy for Natural Language Processing Portability. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2018; 2017:104-112. [PMID: 29888051 PMCID: PMC5961788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Natural Language Processing (NLP) holds potential for patient care and clinical research, but a gap exists between promise and reality. While some studies have demonstrated portability of NLP systems across multiple sites, challenges remain. Strategies to mitigate these challenges can strive for complex NLP problems using advanced methods (hard-to-reach fruit), or focus on simple NLP problems using practical methods (low-hanging fruit). This paper investigates a practical strategy for NLP portability using extraction of left ventricular ejection fraction (LVEF) as a use case. We used a tool developed at the Department of Veterans Affair (VA) to extract the LVEF values from free-text echocardiograms in the MIMIC-III database. The approach showed an accuracy of 98.4%, sensitivity of 99.4%, a positive predictive value of 98.7%, and F-score of 99.0%. This experience, in which a simple NLP solution proved highly portable with excellent performance, illustrates the point that simple NLP applications may be easier to disseminate and adapt, and in the short term may prove more useful, than complex applications.
Collapse
Affiliation(s)
- Stephen B Johnson
- Healthcare Policy and Research, Weill Cornell Medicine, New York, New York
| | - Prakash Adekkanattu
- Information Technologies & Services, Weill Cornell Medicine, New York, New York
| | - Thomas R Campion
- Healthcare Policy and Research, Weill Cornell Medicine, New York, New York
- Information Technologies & Services, Weill Cornell Medicine, New York, New York
| | - James Flory
- Healthcare Policy and Research, Weill Cornell Medicine, New York, New York
| | - Jyotishman Pathak
- Healthcare Policy and Research, Weill Cornell Medicine, New York, New York
| | - Olga V Patterson
- VA Salt Lake City Health Care System
- University of Utah, Salt Lake City, UT
| | - Scott L DuVall
- VA Salt Lake City Health Care System
- University of Utah, Salt Lake City, UT
| | - Vincent Major
- Center for Health Informatics and Bioinformatics, NYU Langone Medical Center, New York, New York
| | - Yindalon Aphinyanaphongs
- Center for Health Informatics and Bioinformatics, NYU Langone Medical Center, New York, New York
| |
Collapse
|
33
|
Eisman AS, Weiner RB, Chen ES, Stey PC, Wadhera RK, Kithcart AP, Sarkar IN. An Automated System for Categorizing Transthoracic Echocardiography Indications According to the Echocardiography Appropriate Use Criteria. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:670-678. [PMID: 29854132 PMCID: PMC5977700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The Echocardiography Appropriate Use Criteria (EAUC) are a set of indications for transthoracic echocardiography (TTE) developed to guide physician decision making around ordering of TTE. In this study, an automated rule-based method for processing "indications" listed within TTE reports and classification into one of the major EAUC categories was developed and validated against a clinician-annotated reference standard. The system performed at a comparable level to trained physicians allowing for the automated classification of more than 30,000 TTE indications from a public database in less than ten minutes. The most common indication for TTE was Valvular assessment closely followed by General. Hypertension/Heart Failure/Cardiomyopathy, Acute, and Cardiac Structure assessment each contributed more than ten percent within this patient population. These results suggest potential for automated approaches for tracking appropriate use of TTE, as well as guide the development of systems for prospectively identifying when TTE use is recommended.
Collapse
Affiliation(s)
- Aaron S Eisman
- Center for Biomedical Informatics, Brown University, Providence, RI
| | | | - Elizabeth S Chen
- Center for Biomedical Informatics, Brown University, Providence, RI
| | - Paul C Stey
- Center for Biomedical Informatics, Brown University, Providence, RI
| | | | | | | |
Collapse
|
34
|
Renganathan V. Text Mining in Biomedical Domain with Emphasis on Document Clustering. Healthc Inform Res 2017; 23:141-146. [PMID: 28875048 PMCID: PMC5572517 DOI: 10.4258/hir.2017.23.3.141] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Revised: 07/16/2017] [Accepted: 07/17/2017] [Indexed: 12/19/2022] Open
Abstract
Objectives With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. Methods This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Results Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Conclusions Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.
Collapse
|
35
|
Patterson OV, Freiberg MS, Skanderson M, J Fodeh S, Brandt CA, DuVall SL. Unlocking echocardiogram measurements for heart disease research through natural language processing. BMC Cardiovasc Disord 2017; 17:151. [PMID: 28606104 PMCID: PMC5469017 DOI: 10.1186/s12872-017-0580-8] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Accepted: 05/25/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In order to investigate the mechanisms of cardiovascular disease in HIV infected and uninfected patients, an analysis of echocardiogram reports is required for a large longitudinal multi-center study. IMPLEMENTATION A natural language processing system using a dictionary lookup, rules, and patterns was developed to extract heart function measurements that are typically recorded in echocardiogram reports as measurement-value pairs. Curated semantic bootstrapping was used to create a custom dictionary that extends existing terminologies based on terms that actually appear in the medical record. A novel disambiguation method based on semantic constraints was created to identify and discard erroneous alternative definitions of the measurement terms. The system was built utilizing a scalable framework, making it available for processing large datasets. RESULTS The system was developed for and validated on notes from three sources: general clinic notes, echocardiogram reports, and radiology reports. The system achieved F-scores of 0.872, 0.844, and 0.877 with precision of 0.936, 0.982, and 0.969 for each dataset respectively averaged across all extracted values. Left ventricular ejection fraction (LVEF) is the most frequently extracted measurement. The precision of extraction of the LVEF measure ranged from 0.968 to 1.0 across different document types. CONCLUSIONS This system illustrates the feasibility and effectiveness of a large-scale information extraction on clinical data. New clinical questions can be addressed in the domain of heart failure using retrospective clinical data analysis because key heart function measurements can be successfully extracted using natural language processing.
Collapse
Affiliation(s)
- Olga V Patterson
- Department of Veterans Affairs Salt Lake City Health Care System, 500 Foothill Drive Bldg. Mail Code 182, Salt Lake City, 84148, UT, USA. .,School of Medicine, University of Utah, 295 Chipeta Way, Salt Lake City, 84132, UT, USA.
| | - Matthew S Freiberg
- VA Tennessee Valley Health Care System, Nashville, TN, USA.,Vanderbilt University Medical Center, Cardiovascular Medicine Division, Nashville, TN, USA
| | | | - Samah J Fodeh
- Center for Medical Informatics, School of Medicine, Yale University, West Haven, CT, USA
| | - Cynthia A Brandt
- Connecticut VA Healthcare System, West Haven, CT, USA.,Center for Medical Informatics, School of Medicine, Yale University, West Haven, CT, USA
| | - Scott L DuVall
- Department of Veterans Affairs Salt Lake City Health Care System, 500 Foothill Drive Bldg. Mail Code 182, Salt Lake City, 84148, UT, USA.,School of Medicine, University of Utah, 295 Chipeta Way, Salt Lake City, 84132, UT, USA
| |
Collapse
|
36
|
An index-based algorithm for fast on-line query processing of latent semantic analysis. PLoS One 2017; 12:e0177523. [PMID: 28520747 PMCID: PMC5433746 DOI: 10.1371/journal.pone.0177523] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Accepted: 04/29/2017] [Indexed: 11/23/2022] Open
Abstract
Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.
Collapse
|
37
|
Claveau V, Silva Oliveira LE, Bouzillé G, Cuggia M, Cabral Moro CM, Grabar N. Numerical Eligibility Criteria in Clinical Protocols: Annotation, Automatic Detection and Interpretation. Artif Intell Med 2017. [DOI: 10.1007/978-3-319-59758-4_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
38
|
Guralnick RP, Zermoglio PF, Wieczorek J, LaFrance R, Bloom D, Russell L. The importance of digitized biocollections as a source of trait data and a new VertNet resource. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw158. [PMID: 28025346 PMCID: PMC5199146 DOI: 10.1093/database/baw158] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2016] [Revised: 11/06/2016] [Accepted: 11/06/2016] [Indexed: 02/02/2023]
Abstract
For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content. Database URL: http://portal.vertnet.org/search?advanced=1
Collapse
Affiliation(s)
- Robert P Guralnick
- University of Florida Museum of Natural History University of Florida at Gainesville, Gainesville, FL, USA
| | - Paula F Zermoglio
- Departamento de Ecología, Genética y Evolución, Instituto IEGEBA (CONICET-UBA), Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina.,Institut de Recherche sur la Biologie de l'Insecte, UMR 7261 CNRS, Université François Rabelais, Tours, France
| | - John Wieczorek
- Museum of Vertebrate Zoology University of California, Berkeley, CA, USA
| | - Raphael LaFrance
- University of Florida Museum of Natural History University of Florida at Gainesville, Gainesville, FL, USA
| | - David Bloom
- University of Florida Museum of Natural History University of Florida at Gainesville, Gainesville, FL, USA
| | - Laura Russell
- University of Florida Museum of Natural History University of Florida at Gainesville, Gainesville, FL, USA.,Biodiversity Institute University of Kansas, Lawrence, KS, USA
| |
Collapse
|