1
|
Seinen TM, Kors JA, van Mulligen EM, Fridgeirsson EA, Verhamme KM, Rijnbeek PR. Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data. Int J Med Inform 2024; 189:105506. [PMID: 38820647 DOI: 10.1016/j.ijmedinf.2024.105506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 05/22/2024] [Accepted: 05/27/2024] [Indexed: 06/02/2024]
Abstract
OBJECTIVE Observational studies using electronic health record (EHR) databases often face challenges due to unspecific clinical codes that can obscure detailed medical information, hindering precise data analysis. In this study, we aimed to assess the feasibility of refining these unspecific condition codes into more specific codes in a Dutch general practitioner (GP) EHR database by leveraging the available clinical free text. METHODS We utilized three approaches for text classification-search queries, semi-supervised learning, and supervised learning-to improve the specificity of ten unspecific International Classification of Primary Care (ICPC-1) codes. Two text representations and three machine learning algorithms were evaluated for the (semi-)supervised models. Additionally, we measured the improvement achieved by the refinement process on all code occurrences in the database. RESULTS The classification models performed well for most codes. In general, no single classification approach consistently outperformed the others. However, there were variations in the relative performance of the classification approaches within each code and in the use of different text representations and machine learning algorithms. Class imbalance and limited training data affected the performance of the (semi-)supervised models, yet the simple search queries remained particularly effective. Ultimately, the developed models improved the specificity of over half of all the unspecific code occurrences in the database. CONCLUSIONS Our findings show the feasibility of using information from clinical text to improve the specificity of unspecific condition codes in observational healthcare databases, even with a limited range of machine-learning techniques and modest annotated training sets. Future work could investigate transfer learning, integration of structured data, alternative semi-supervised methods, and validation of models across healthcare settings. The improved level of detail enriches the interpretation of medical information and can benefit observational research and patient care.
Collapse
Affiliation(s)
- Tom M Seinen
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands.
| | - Jan A Kors
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Erik M van Mulligen
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Egill A Fridgeirsson
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Katia Mc Verhamme
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Peter R Rijnbeek
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands
| |
Collapse
|
2
|
Wattanachayakul P, Yanpiset P, Suenghataiphorn T, Srikulmontri T, Danpanichkul P, Rujirachun P, Polpichai N, Saowapa S, Casipit BA, Suparan K, Amanullah A. Impact of COVID-19 infection among patients hospitalized for conventional pacemaker implantation: Analysis of the Nationwide Inpatient Sample (NIS) 2020. J Arrhythm 2024; 40:905-912. [PMID: 39139863 PMCID: PMC11317689 DOI: 10.1002/joa3.13089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/06/2024] [Accepted: 05/21/2024] [Indexed: 08/15/2024] Open
Abstract
Introduction The cardiac pacemaker is indicated for treating various types of bradyarrhythmia, providing lifelong cardiovascular benefits. Recent data showed that COVID-19 has impacted procedure numbers and led to adverse long-term outcomes in patients with cardiac pacemakers. However, the impact of COVID-19 infection on the in-hospital outcome of patients undergoing conventional pacemaker implantation remains unclear. Method Patients aged above 18 years who were hospitalized for conventional pacemaker implantation in the Nationwide In-patient Sample (NIS) 2020 were identified using relevant ICD-10 CM and PCS codes. Multivariable logistic and linear regression models were used to analyze pre-specified outcomes, with the primary outcome being in-patient mortality and secondary outcomes including system-based and procedure-related complications. Results Of 108 020 patients hospitalized for conventional pacemaker implantation, 0.71% (765 out of 108 020) had a concurrent diagnosis of COVID-19 infection. Individuals with COVID-19 infection exhibited a lower mean age (73.7 years vs. 75.9 years, p = .027) and a lower female proportion (39.87% vs. 47.60%, p = .062) than those without COVID-19. In the multivariable logistic and linear regression models, adjusted for patient and hospital factors, COVID-19 infection was associated with higher in-hospital mortality (aOR 4.67; 95% CI 2.02 to 10.27, p < .001), extended length of stay (5.23 days vs. 1.04 days, p < .001), and linked with various in-hospital complications, including sepsis, acute respiratory failure, post-procedural pneumothorax, and venous thromboembolism. Conclusion Our study suggests that COVID-19 infection is attributed to higher in-hospital mortality, extended hospital stays, and increased adverse in-hospital outcomes in patients undergoing conventional pacemaker implantation.
Collapse
Affiliation(s)
- Phuuwadith Wattanachayakul
- Department of MedicineJefferson Einstein HospitalPhiladelphiaPennsylvaniaUSA
- Sidney Kimmel Medical CollegeThomas Jefferson UniversityPhiladelphiaPennsylvaniaUSA
| | - Panat Yanpiset
- Faculty of Medicine Chiang Mai UniversityChiang MaiThailand
| | | | | | - Pojsakorn Danpanichkul
- Immunology Unit, Department of Microbiology, Faculty of MedicineChiang Mai UniversityChiang MaiThailand
| | | | | | | | - Bruce A. Casipit
- Department of MedicineJefferson Einstein HospitalPhiladelphiaPennsylvaniaUSA
- Sidney Kimmel Medical CollegeThomas Jefferson UniversityPhiladelphiaPennsylvaniaUSA
| | - Kanokphong Suparan
- Immunology Unit, Department of Microbiology, Faculty of MedicineChiang Mai UniversityChiang MaiThailand
| | - Aman Amanullah
- Sidney Kimmel Medical CollegeThomas Jefferson UniversityPhiladelphiaPennsylvaniaUSA
- Division of Cardiovascular DiseaseJefferson Einstein HospitalPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
3
|
Tavabi N, Singh M, Pruneski J, Kiapour AM. Systematic evaluation of common natural language processing techniques to codify clinical notes. PLoS One 2024; 19:e0298892. [PMID: 38451905 PMCID: PMC10919678 DOI: 10.1371/journal.pone.0298892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 01/31/2024] [Indexed: 03/09/2024] Open
Abstract
Proper codification of medical diagnoses and procedures is essential for optimized health care management, quality improvement, research, and reimbursement tasks within large healthcare systems. Assignment of diagnostic or procedure codes is a tedious manual process, often prone to human error. Natural Language Processing (NLP) has been suggested to facilitate this manual codification process. Yet, little is known on best practices to utilize NLP for such applications. With Large Language Models (LLMs) becoming more ubiquitous in daily life, it is critical to remember, not every task requires that level of resource and effort. Here we comprehensively assessed the performance of common NLP techniques to predict current procedural terminology (CPT) from operative notes. CPT codes are commonly used to track surgical procedures and interventions and are the primary means for reimbursement. Our analysis of 100 most common musculoskeletal CPT codes suggest that traditional approaches can outperform more resource intensive approaches like BERT significantly (P-value = 4.4e-17) with average AUROC of 0.96 and accuracy of 0.97, in addition to providing interpretability which can be very helpful and even crucial in the clinical domain. We also proposed a complexity measure to quantify the complexity of a classification task and how this measure could influence the effect of dataset size on model's performance. Finally, we provide preliminary evidence that NLP can help minimize the codification error, including mislabeling due to human error.
Collapse
Affiliation(s)
- Nazgol Tavabi
- Boston Children’s Hospital, Boston, MA, United States of America
- Harvard Medical School, Boston, MA, United States of America
| | - Mallika Singh
- Boston Children’s Hospital, Boston, MA, United States of America
| | - James Pruneski
- Boston Children’s Hospital, Boston, MA, United States of America
- Harvard Medical School, Boston, MA, United States of America
| | - Ata M. Kiapour
- Boston Children’s Hospital, Boston, MA, United States of America
- Harvard Medical School, Boston, MA, United States of America
| |
Collapse
|
4
|
Boonstra MJ, Weissenbacher D, Moore JH, Gonzalez-Hernandez G, Asselbergs FW. Artificial intelligence: revolutionizing cardiology with large language models. Eur Heart J 2024; 45:332-345. [PMID: 38170821 PMCID: PMC10834163 DOI: 10.1093/eurheartj/ehad838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 12/01/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
Natural language processing techniques are having an increasing impact on clinical care from patient, clinician, administrator, and research perspective. Among others are automated generation of clinical notes and discharge letters, medical term coding for billing, medical chatbots both for patients and clinicians, data enrichment in the identification of disease symptoms or diagnosis, cohort selection for clinical trial, and auditing purposes. In the review, an overview of the history in natural language processing techniques developed with brief technical background is presented. Subsequently, the review will discuss implementation strategies of natural language processing tools, thereby specifically focusing on large language models, and conclude with future opportunities in the application of such techniques in the field of cardiology.
Collapse
Affiliation(s)
- Machteld J Boonstra
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, Netherlands
| | - Davy Weissenbacher
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | | | - Folkert W Asselbergs
- Department of Cardiology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centre, University of Amsterdam, Amsterdam, Netherlands
- Institute of Health Informatics, University College London, London, UK
- The National Institute for Health Research University College London Hospitals Biomedical Research Centre, University College London, London, UK
| |
Collapse
|
5
|
Desai R, Katukuri N, Goguri SR, Kothawala A, Alle NR, Bellamkonda MK, Dey D, Ganesan S, Biswas M, Sarkar K, Prattipati P, Chauhan S. Prediabetes: An overlooked risk factor for major adverse cardiac and cerebrovascular events in atrial fibrillation patients. World J Diabetes 2024; 15:24-33. [PMID: 38313858 PMCID: PMC10835500 DOI: 10.4239/wjd.v15.i1.24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 10/22/2023] [Accepted: 12/15/2023] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Prediabetes is a well-established risk factor for major adverse cardiac and cerebrovascular events (MACCE). However, the relationship between prediabetes and MACCE in atrial fibrillation (AF) patients has not been extensively studied. Therefore, this study aimed to establish a link between prediabetes and MACCE in AF patients. AIM To investigate a link between prediabetes and MACCE in AF patients. METHODS We used the National Inpatient Sample (2019) and relevant ICD-10 CM codes to identify hospitalizations with AF and categorized them into groups with and without prediabetes, excluding diabetics. The primary outcome was MACCE (all-cause inpatient mortality, cardiac arrest including ventricular fibrillation, and stroke) in AF-related hospitalizations. RESULTS Of the 2965875 AF-related hospitalizations for MACCE, 47505 (1.6%) were among patients with prediabetes. The prediabetes cohort was relatively younger (median 75 vs 78 years), and often consisted of males (56.3% vs 51.4%), blacks (9.8% vs 7.9%), Hispanics (7.3% vs 4.3%), and Asians (4.7% vs 1.6%) than the non-prediabetic cohort (P < 0.001). The prediabetes group had significantly higher rates of hypertension, hyperlipidemia, smoking, obesity, drug abuse, prior myocardial infarction, peripheral vascular disease, and hyperthyroidism (all P < 0.05). The prediabetes cohort was often discharged routinely (51.1% vs 41.1%), but more frequently required home health care (23.6% vs 21.0%) and had higher costs. After adjusting for baseline characteristics or comorbidities, the prediabetes cohort with AF admissions showed a higher rate and significantly higher odds of MACCE compared to the non-prediabetic cohort [18.6% vs 14.7%, odds ratio (OR) 1.34, 95% confidence interval 1.26-1.42, P < 0.001]. On subgroup analyses, males had a stronger association (aOR 1.43) compared to females (aOR 1.22), whereas on the race-wise comparison, Hispanics (aOR 1.43) and Asians (aOR 1.36) had a stronger association with MACCE with prediabetes vs whites (aOR 1.33) and blacks (aOR 1.21). CONCLUSION This population-based study found a significant association between prediabetes and MACCE in AF patients. Therefore, there is a need for further research to actively screen and manage prediabetes in AF to prevent MACCE.
Collapse
Affiliation(s)
- Rupak Desai
- Independent Researcher, Independent Researcher, Atlanta, GA 30079, United States
| | - Nishanth Katukuri
- Department of Internal Medicine, Mayo Clinic, Rochester, MN 55902, United States
| | - Sumaja Reddy Goguri
- Department of Medicine, Chalmeda Anand Rao Institute of Medical Sciences, Telangana 505001, India
| | - Azra Kothawala
- Department of Medicine, Jawaharlal Nehru Medical College, Belgaum 590010, India
| | - Naga Ruthvika Alle
- Department of Medicine, Narayana Medical College, Andhra Pradesh, Nellore 524003, India
| | - Meena Kumari Bellamkonda
- Department of Medicine, Dr Pinnamaneni Siddhartha Institute of Medical Sciences and Research Foundation, Vijaywada 521286, India
| | - Debankur Dey
- Department of Medicine, Medical College Kolkata, Kolkata 700073, India
| | - Sharmila Ganesan
- Department of Medicine, P.E.S. Institute of Medical Sciences and Research, Andhra Pradesh 517425, India
| | - Minakshi Biswas
- Department of Medicine, Shaheed Ziaur Rahman Medical College, Bogra 5800, Bangladesh
| | - Kuheli Sarkar
- Department of Medicine, College of Medicine and J.N.M Hospital, Kalyani 741235, India
| | - Pramoda Prattipati
- Department of Medicine, Jawaharlal Nehru Medical College India, Karnataka, Belagavi 590010, India
| | - Shaylika Chauhan
- Department of Internal Medicine, Geisinger Health System, Wikes-Barre, PA 18702, United States
| |
Collapse
|
6
|
Hobensack M, Song J, Oh S, Evans L, Davoudi A, Bowles KH, McDonald MV, Barrón Y, Sridharan S, Wallace AS, Topaz M. Social Risk Factors are Associated with Risk for Hospitalization in Home Health Care: A Natural Language Processing Study. J Am Med Dir Assoc 2023; 24:1874-1880.e4. [PMID: 37553081 PMCID: PMC10839109 DOI: 10.1016/j.jamda.2023.06.031] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/23/2023] [Accepted: 06/25/2023] [Indexed: 08/10/2023]
Abstract
OBJECTIVE This study aimed to develop a natural language processing (NLP) system that identified social risk factors in home health care (HHC) clinical notes and to examine the association between social risk factors and hospitalization or an emergency department (ED) visit. DESIGN Retrospective cohort study. SETTING AND PARTICIPANTS We used standardized assessments and clinical notes from one HHC agency located in the northeastern United States. This included 86,866 episodes of care for 65,593 unique patients. Patients received HHC services between 2015 and 2017. METHODS Guided by HHC experts, we created a vocabulary of social risk factors that influence hospitalization or ED visit risk in the HHC setting. We then developed an NLP system to automatically identify social risk factors documented in clinical notes. We used an adjusted logistic regression model to examine the association between the NLP-based social risk factors and hospitalization or an ED visit. RESULTS On the basis of expert consensus, the following social risk factors emerged: Social Environment, Physical Environment, Education and Literacy, Food Insecurity, Access to Care, and Housing and Economic Circumstances. Our NLP system performed "very good" with an F score of 0.91. Approximately 4% of clinical notes (33% episodes of care) documented a social risk factor. The most frequently documented social risk factors were Physical Environment and Social Environment. Except for Housing and Economic Circumstances, all NLP-based social risk factors were associated with higher odds of hospitalization and ED visits. CONCLUSIONS AND IMPLICATIONS HHC clinicians assess and document social risk factors associated with hospitalizations and ED visits in their clinical notes. Future studies can explore the social risk factors documented in HHC to improve communication across the health care system and to predict patients at risk for being hospitalized or visiting the ED.
Collapse
Affiliation(s)
| | - Jiyoun Song
- Columbia University School of Nursing, New York City, NY, USA
| | - Sungho Oh
- University of Pennsylvania School of Nursing, Philadelphia, PA, USA
| | - Lauren Evans
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | - Anahita Davoudi
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | - Kathryn H Bowles
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA; Department of Biobehavioral Health Sciences, NewCourtland Center for Transitions and Health, University of Pennsylvania School of Nursing, Philadelphia, PA, USA
| | | | - Yolanda Barrón
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | - Sridevi Sridharan
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | - Andrea S Wallace
- The University of Utah College of Nursing, Salt Lake City, UT, USA
| | - Maxim Topaz
- Columbia University School of Nursing, New York City, NY, USA; Center for Home Care Policy & Research, VNS Health, New York, NY, USA; Data Science Institute, Columbia University, New York City, NY, USA
| |
Collapse
|
7
|
Chen PF, He TL, Lin SC, Chu YC, Kuo CT, Lai F, Wang SM, Zhu WX, Chen KC, Kuo LC, Hung FM, Lin YC, Tsai IC, Chiu CH, Chang SC, Yang CY. Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study. JMIR Med Inform 2022; 10:e41342. [DOI: 10.2196/41342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/03/2022] [Accepted: 10/08/2022] [Indexed: 11/12/2022] Open
Abstract
Background
The automatic coding of clinical text documents by using the International Classification of Diseases, 10th Revision (ICD-10) can be performed for statistical analyses and reimbursements. With the development of natural language processing models, new transformer architectures with attention mechanisms have outperformed previous models. Although multicenter training may increase a model’s performance and external validity, the privacy of clinical documents should be protected. We used federated learning to train a model with multicenter data, without sharing data per se.
Objective
This study aims to train a classification model via federated learning for ICD-10 multilabel classification.
Methods
Text data from discharge notes in electronic medical records were collected from the following three medical centers: Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital. After comparing the performance of different variants of bidirectional encoder representations from transformers (BERT), PubMedBERT was chosen for the word embeddings. With regard to preprocessing, the nonalphanumeric characters were retained because the model’s performance decreased after the removal of these characters. To explain the outputs of our model, we added a label attention mechanism to the model architecture. The model was trained with data from each of the three hospitals separately and via federated learning. The models trained via federated learning and the models trained with local data were compared on a testing set that was composed of data from the three hospitals. The micro F1 score was used to evaluate model performance across all 3 centers.
Results
The F1 scores of PubMedBERT, RoBERTa (Robustly Optimized BERT Pretraining Approach), ClinicalBERT, and BioBERT (BERT for Biomedical Text Mining) were 0.735, 0.692, 0.711, and 0.721, respectively. The F1 score of the model that retained nonalphanumeric characters was 0.8120, whereas the F1 score after removing these characters was 0.7875—a decrease of 0.0245 (3.11%). The F1 scores on the testing set were 0.6142, 0.4472, 0.5353, and 0.2522 for the federated learning, Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital models, respectively. The explainable predictions were displayed with highlighted input words via the label attention architecture.
Conclusions
Federated learning was used to train the ICD-10 classification model on multicenter clinical text while protecting data privacy. The model’s performance was better than that of models that were trained locally.
Collapse
|
8
|
Sammani A, Jansen M, de Vries NM, de Jonge N, Baas AF, te Riele ASJM, Asselbergs FW, Oerlemans MIFJ. Automatic Identification of Patients With Unexplained Left Ventricular Hypertrophy in Electronic Health Record Data to Improve Targeted Treatment and Family Screening. Front Cardiovasc Med 2022; 9:768847. [PMID: 35498038 PMCID: PMC9051030 DOI: 10.3389/fcvm.2022.768847] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 02/18/2022] [Indexed: 11/29/2022] Open
Abstract
Background Unexplained Left Ventricular Hypertrophy (ULVH) may be caused by genetic and non-genetic etiologies (e.g., sarcomere variants, cardiac amyloid, or Anderson-Fabry's disease). Identification of ULVH patients allows for early targeted treatment and family screening. Aim To automatically identify patients with ULVH in electronic health record (EHR) data using two computer methods: text-mining and machine learning (ML). Methods Adults with echocardiographic measurement of interventricular septum thickness (IVSt) were included. A text-mining algorithm was developed to identify patients with ULVH. An ML algorithm including a variety of clinical, ECG and echocardiographic data was trained and tested in an 80/20% split. Clinical diagnosis of ULVH was considered the gold standard. Misclassifications were reviewed by an experienced cardiologist. Sensitivity, specificity, positive, and negative likelihood ratios (LHR+ and LHR–) of both text-mining and ML were reported. Results In total, 26,954 subjects (median age 61 years, 55% male) were included. ULVH was diagnosed in 204/26,954 (0.8%) patients, of which 56 had amyloidosis and two Anderson-Fabry Disease. Text-mining flagged 8,192 patients with possible ULVH, of whom 159 were true positives (sensitivity, specificity, LHR+, and LHR– of 0.78, 0.67, 2.36, and 0.33). Machine learning resulted in a sensitivity, specificity, LHR+, and LHR– of 0.32, 0.99, 32, and 0.68, respectively. Pivotal variables included IVSt, systolic blood pressure, and age. Conclusions Automatic identification of patients with ULVH is possible with both Text-mining and ML. Text-mining may be a comprehensive scaffold but can be less specific than machine learning. Deployment of either method depends on existing infrastructures and clinical applications.
Collapse
Affiliation(s)
- Arjan Sammani
- Department of Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Mark Jansen
- Department of Genetics, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Nynke M. de Vries
- Department of Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Nicolaas de Jonge
- Department of Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Annette F. Baas
- Department of Genetics, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | | | - Folkert W. Asselbergs
- Department of Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, United Kingdom
| | - Marish I. F. J. Oerlemans
- Department of Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- *Correspondence: Marish I. F. J. Oerlemans
| |
Collapse
|
9
|
Blanco A, Remmer S, Pérez A, Dalianis H, Casillas A. Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish. J Biomed Inform 2022; 130:104050. [DOI: 10.1016/j.jbi.2022.104050] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 01/31/2022] [Accepted: 03/07/2022] [Indexed: 11/30/2022]
|
10
|
Siegersma KR, Evers M, Bots SH, Groepenhoff F, Appelman Y, Hofstra L, Tulevski II, Somsen GA, den Ruijter HM, Spruit M, Onland-Moret NC. Development of a Pipeline for Adverse Drug Reaction Identification in Clinical Notes: Word Embedding Models and String Matching. JMIR Med Inform 2022; 10:e31063. [PMID: 35076407 PMCID: PMC8826143 DOI: 10.2196/31063] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 11/02/2021] [Accepted: 11/14/2021] [Indexed: 12/02/2022] Open
Abstract
Background Knowledge about adverse drug reactions (ADRs) in the population is limited because of underreporting, which hampers surveillance and assessment of drug safety. Therefore, gathering accurate information that can be retrieved from clinical notes about the incidence of ADRs is of great relevance. However, manual labeling of these notes is time-consuming, and automatization can improve the use of free-text clinical notes for the identification of ADRs. Furthermore, tools for language processing in languages other than English are not widely available. Objective The aim of this study is to design and evaluate a method for automatic extraction of medication and Adverse Drug Reaction Identification in Clinical Notes (ADRIN). Methods Dutch free-text clinical notes (N=277,398) and medication registrations (N=499,435) from the Cardiology Centers of the Netherlands database were used. All clinical notes were used to develop word embedding models. Vector representations of word embedding models and string matching with a medical dictionary (Medical Dictionary for Regulatory Activities [MedDRA]) were used for identification of ADRs and medication in a test set of clinical notes that were manually labeled. Several settings, including search area and punctuation, could be adjusted in the prototype to evaluate the optimal version of the prototype. Results The ADRIN method was evaluated using a test set of 988 clinical notes written on the stop date of a drug. Multiple versions of the prototype were evaluated for a variety of tasks. Binary classification of ADR presence achieved the highest accuracy of 0.84. Reduced search area and inclusion of punctuation improved performance, whereas incorporation of the MedDRA did not improve the performance of the pipeline. Conclusions The ADRIN method and prototype are effective in recognizing ADRs in Dutch clinical notes from cardiac diagnostic screening centers. Surprisingly, incorporation of the MedDRA did not result in improved identification on top of word embedding models. The implementation of the ADRIN tool may help increase the identification of ADRs, resulting in better care and saving substantial health care costs.
Collapse
Affiliation(s)
- Klaske R Siegersma
- Laboratory of Experimental Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Department of Cardiology, Amsterdam University Medical Centers, VU University Medical Center, Amsterdam, Netherlands
| | - Maxime Evers
- Laboratory of Experimental Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Sophie H Bots
- Laboratory of Experimental Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Floor Groepenhoff
- Laboratory of Experimental Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Central Diagnostic Laboratory, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Yolande Appelman
- Department of Cardiology, Amsterdam University Medical Centers, VU University Medical Center, Amsterdam, Netherlands
| | - Leonard Hofstra
- Department of Cardiology, Amsterdam University Medical Centers, VU University Medical Center, Amsterdam, Netherlands
- Cardiology Centers of the Netherlands, Utrecht, Netherlands
| | | | | | - Hester M den Ruijter
- Laboratory of Experimental Cardiology, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Marco Spruit
- Department of Public Health and Primary Care, Leiden University Medical Center, Leiden University, Leiden, Netherlands
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, Netherlands
| | - N Charlotte Onland-Moret
- Department of Epidemiology, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| |
Collapse
|
11
|
Asselbergs FW, Fraser AG. Artificial intelligence in cardiology: the debate continues. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2021; 2:721-726. [PMID: 36713089 PMCID: PMC9708032 DOI: 10.1093/ehjdh/ztab090] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 10/12/2021] [Indexed: 02/01/2023]
Abstract
In 1955, when John McCarthy and his colleagues proposed their first study of artificial intelligence, they suggested that 'every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it'. Whether that might ever be possible would depend on how we define intelligence, but what is indisputable is that new methods are needed to analyse and interpret the copious information provided by digital medical images, genomic databases, and biobanks. Technological advances have enabled applications of artificial intelligence (AI) including machine learning (ML) to be implemented into clinical practice, and their related scientific literature is exploding. Advocates argue enthusiastically that AI will transform many aspects of clinical cardiovascular medicine, while sceptics stress the importance of caution and the need for more evidence. This report summarizes the main opposing arguments that were presented in a debate at the 2021 Congress of the European Society of Cardiology. Artificial intelligence is an advanced analytical technique that should be considered when conventional statistical methods are insufficient, but testing a hypothesis or solving a clinical problem-not finding another application for AI-remains the most important objective. Artificial intelligence and ML methods should be transparent and interpretable, if they are to be approved by regulators and trusted to provide support for clinical decisions. Physicians need to understand AI methods and collaborate with engineers. Few applications have yet been shown to have a positive impact on clinical outcomes, so investment in research is essential.
Collapse
Affiliation(s)
- Folkert W Asselbergs
- Division Heart and Lungs, Department of Cardiology, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, Netherlands,Institute of Health Informatics and Institute of Cardiovascular Science, University College London, 222 Euston Rd, London NW1 2DA, UK,NIHR BRC Clinical Research Informatics Unit, University College London Hospital, London, UK
| | - Alan G Fraser
- School of Medicine, Cardiff University, University Hospital of Wales, Heath Park, Cardiff CF14 4XW, UK,Cardiovascular Imaging and Dynamics, Katholieke Universiteit Leuven, UZ Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium,Corresponding author. Tel: +44 (0)29 2184 5366, Fax: +44 (0)29 2184 4473,
| |
Collapse
|
12
|
Automatic Prediction of Recurrence of Major Cardiovascular Events: A Text Mining Study Using Chest X-Ray Reports. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:6663884. [PMID: 34306597 PMCID: PMC8285182 DOI: 10.1155/2021/6663884] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 05/29/2021] [Accepted: 06/29/2021] [Indexed: 11/17/2022]
Abstract
Methods We used EHR data of patients included in the Second Manifestations of ARTerial disease (SMART) study. We propose a deep learning-based multimodal architecture for our text mining pipeline that integrates neural text representation with preprocessed clinical predictors for the prediction of recurrence of major cardiovascular events in cardiovascular patients. Text preprocessing, including cleaning and stemming, was first applied to filter out the unwanted texts from X-ray radiology reports. Thereafter, text representation methods were used to numerically represent unstructured radiology reports with vectors. Subsequently, these text representation methods were added to prediction models to assess their clinical relevance. In this step, we applied logistic regression, support vector machine (SVM), multilayer perceptron neural network, convolutional neural network, long short-term memory (LSTM), and bidirectional LSTM deep neural network (BiLSTM). Results We performed various experiments to evaluate the added value of the text in the prediction of major cardiovascular events. The two main scenarios were the integration of radiology reports (1) with classical clinical predictors and (2) with only age and sex in the case of unavailable clinical predictors. In total, data of 5603 patients were used with 5-fold cross-validation to train the models. In the first scenario, the multimodal BiLSTM (MI-BiLSTM) model achieved an area under the curve (AUC) of 84.7%, misclassification rate of 14.3%, and F1 score of 83.8%. In this scenario, the SVM model, trained on clinical variables and bag-of-words representation, achieved the lowest misclassification rate of 12.2%. In the case of unavailable clinical predictors, the MI-BiLSTM model trained on radiology reports and demographic (age and sex) variables reached an AUC, F1 score, and misclassification rate of 74.5%, 70.8%, and 20.4%, respectively. Conclusions Using the case study of routine care chest X-ray radiology reports, we demonstrated the clinical relevance of integrating text features and classical predictors in our text mining pipeline for cardiovascular risk prediction. The MI-BiLSTM model with word embedding representation appeared to have a desirable performance when trained on text data integrated with the clinical variables from the SMART study. Our results mined from chest X-ray reports showed that models using text data in addition to laboratory values outperform those using only known clinical predictors.
Collapse
|