1
|
Sung SF, Hu YH, Chen CY. Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study. JMIR Med Inform 2024; 12:e56955. [PMID: 39352715 PMCID: PMC11460304 DOI: 10.2196/56955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 08/29/2024] [Accepted: 09/01/2024] [Indexed: 10/10/2024] Open
Abstract
Background Electronic medical records store extensive patient data and serve as a comprehensive repository, including textual medical records like surgical and imaging reports. Their utility in clinical decision support systems is substantial, but the widespread use of ambiguous and unstandardized abbreviations in clinical documents poses challenges for natural language processing in clinical decision support systems. Efficient abbreviation disambiguation methods are needed for effective information extraction. Objective This study aims to enhance the one-to-all (OTA) framework for clinical abbreviation expansion, which uses a single model to predict multiple abbreviation meanings. The objective is to improve OTA by developing context-candidate pairs and optimizing word embeddings in Bidirectional Encoder Representations From Transformers (BERT), evaluating the model's efficacy in expanding clinical abbreviations using real data. Methods Three datasets were used: Medical Subject Headings Word Sense Disambiguation, University of Minnesota, and Chia-Yi Christian Hospital from Ditmanson Medical Foundation Chia-Yi Christian Hospital. Texts containing polysemous abbreviations were preprocessed and formatted for BERT. The study involved fine-tuning pretrained models, ClinicalBERT and BlueBERT, generating dataset pairs for training and testing based on Huang et al's method. Results BlueBERT achieved macro- and microaccuracies of 95.41% and 95.16%, respectively, on the Medical Subject Headings Word Sense Disambiguation dataset. It improved macroaccuracy by 0.54%-1.53% compared to two baselines, long short-term memory and deepBioWSD with random embedding. On the University of Minnesota dataset, BlueBERT recorded macro- and microaccuracies of 98.40% and 98.22%, respectively. Against the baselines of Word2Vec + support vector machine and BioWordVec + support vector machine, BlueBERT demonstrated a macroaccuracy improvement of 2.61%-4.13%. Conclusions This research preliminarily validated the effectiveness of the OTA method for abbreviation disambiguation in medical texts, demonstrating the potential to enhance both clinical staff efficiency and research effectiveness.
Collapse
Affiliation(s)
- Sheng-Feng Sung
- Division of Neurology, Department of Internal Medicine, Ditmanson Medical Foundation Chia-Yi Christian Hospital, Chiayi City, Taiwan
- Department of Nursing, Fooyin University, Kaohsiung, Taiwan
| | - Ya-Han Hu
- Department of Information Management, National Central University, 300 Zhongda Rd, Zhongli District, Taoyuan City, 32001, Taiwan, 886 34227151 ext 66560
| | - Chong-Yan Chen
- Department of Information Management, National Central University, 300 Zhongda Rd, Zhongli District, Taoyuan City, 32001, Taiwan, 886 34227151 ext 66560
| |
Collapse
|
2
|
Gao J, Chen G, O’Rourke AP, Caskey J, Carey KA, Oguss M, Stey A, Dligach D, Miller T, Mayampurath A, Churpek MM, Afshar M. Automated stratification of trauma injury severity across multiple body regions using multi-modal, multi-class machine learning models. J Am Med Inform Assoc 2024; 31:1291-1302. [PMID: 38587875 PMCID: PMC11105131 DOI: 10.1093/jamia/ocae071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/29/2024] [Accepted: 03/21/2024] [Indexed: 04/09/2024] Open
Abstract
OBJECTIVE The timely stratification of trauma injury severity can enhance the quality of trauma care but it requires intense manual annotation from certified trauma coders. The objective of this study is to develop machine learning models for the stratification of trauma injury severity across various body regions using clinical text and structured electronic health records (EHRs) data. MATERIALS AND METHODS Our study utilized clinical documents and structured EHR variables linked with the trauma registry data to create 2 machine learning models with different approaches to representing text. The first one fuses concept unique identifiers (CUIs) extracted from free text with structured EHR variables, while the second one integrates free text with structured EHR variables. Temporal validation was undertaken to ensure the models' temporal generalizability. Additionally, analyses to assess the variable importance were conducted. RESULTS Both models demonstrated impressive performance in categorizing leg injuries, achieving high accuracy with macro-F1 scores of over 0.8. Additionally, they showed considerable accuracy, with macro-F1 scores exceeding or near 0.7, in assessing injuries in the areas of the chest and head. We showed in our variable importance analysis that the most important features in the model have strong face validity in determining clinically relevant trauma injuries. DISCUSSION The CUI-based model achieves comparable performance, if not higher, compared to the free-text-based model, with reduced complexity. Furthermore, integrating structured EHR data improves performance, particularly when the text modalities are insufficiently indicative. CONCLUSIONS Our multi-modal, multiclass models can provide accurate stratification of trauma injury severity and clinically relevant interpretations.
Collapse
Affiliation(s)
- Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53726, United States
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53726, United States
| | - Ann P O’Rourke
- Department of Surgery, University of Wisconsin–Madison, Madison, WI 53792, United States
| | - John Caskey
- Department of Medicine, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kyle A Carey
- Department of Medicine, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Madeline Oguss
- Department of Medicine, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Anne Stey
- Department of Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, United States
- Center of Health Services and Outcomes Research, Institute for Public Health and Medicine, Chicago, IL 60611, United States
| | - Dmitriy Dligach
- Department of Computer Science, Loyola University Chicago, Chicago, IL 60660, United States
| | - Timothy Miller
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02115, United States
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, United States
| | - Anoop Mayampurath
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53726, United States
- Department of Medicine, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Matthew M Churpek
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53726, United States
- Department of Medicine, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Majid Afshar
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53726, United States
- Department of Medicine, University of Wisconsin–Madison, Madison, WI 53705, United States
| |
Collapse
|
3
|
Gao Y, Mahajan D, Uzuner Ö, Yetisgen M. Clinical natural language processing for secondary uses. J Biomed Inform 2024; 150:104596. [PMID: 38278312 PMCID: PMC11212507 DOI: 10.1016/j.jbi.2024.104596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 01/17/2024] [Indexed: 01/28/2024]
Affiliation(s)
- Yanjun Gao
- Department of Medicine, University of Wisconsin Madison, Madison, WI, United States; IBM T.J. Watson Research Center, Yorktown Heights, NY, United States; Department of Information Sciences and Technology, George Mason University, Fairfax, VA, United States; Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States.
| | - Diwakar Mahajan
- Department of Medicine, University of Wisconsin Madison, Madison, WI, United States; IBM T.J. Watson Research Center, Yorktown Heights, NY, United States; Department of Information Sciences and Technology, George Mason University, Fairfax, VA, United States; Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States. https://twitter.com/diwakarmahajan
| | - Özlem Uzuner
- Department of Medicine, University of Wisconsin Madison, Madison, WI, United States; IBM T.J. Watson Research Center, Yorktown Heights, NY, United States; Department of Information Sciences and Technology, George Mason University, Fairfax, VA, United States; Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States. https://twitter.com/ozlem_uzuner_gm
| | - Meliha Yetisgen
- Department of Medicine, University of Wisconsin Madison, Madison, WI, United States; IBM T.J. Watson Research Center, Yorktown Heights, NY, United States; Department of Information Sciences and Technology, George Mason University, Fairfax, VA, United States; Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| |
Collapse
|