1
|
Redd D, Workman TE, Shao Y, Cheng Y, Tekle S, Garvin JH, Brandt CA, Zeng-Treitler Q. Patient Dietary Supplements Use: Do Results from Natural Language Processing of Clinical Notes Agree with Survey Data? Med Sci (Basel) 2023; 11:37. [PMID: 37367736 DOI: 10.3390/medsci11020037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 04/18/2023] [Accepted: 05/06/2023] [Indexed: 06/28/2023] Open
Abstract
There is widespread use of dietary supplements, some prescribed but many taken without a physician's guidance. There are many potential interactions between supplements and both over-the-counter and prescription medications in ways that are unknown to patients. Structured medical records do not adequately document supplement use; however, unstructured clinical notes often contain extra information on supplements. We studied a group of 377 patients from three healthcare facilities and developed a natural language processing (NLP) tool to detect supplement use. Using surveys of these patients, we investigated the correlation between self-reported supplement use and NLP extractions from the clinical notes. Our model achieved an F1 score of 0.914 for detecting all supplements. Individual supplement detection had a variable correlation with survey responses, ranging from an F1 of 0.83 for calcium to an F1 of 0.39 for folic acid. Our study demonstrated good NLP performance while also finding that self-reported supplement use is not always consistent with the documented use in clinical records.
Collapse
Affiliation(s)
- Douglas Redd
- Center for Data Science and Outcome Research, Washington DC VA Medical Center, Washington, DC 20422, USA
- Department of Clinical Research and Leadership, George Washington University, Washington, DC 20037, USA
| | - Terri Elizabeth Workman
- Department of Clinical Research and Leadership, George Washington University, Washington, DC 20037, USA
- VA Salt Lake City Health Care System, Salt Lake City, UT 84148, USA
| | - Yijun Shao
- Department of Clinical Research and Leadership, George Washington University, Washington, DC 20037, USA
- VA Salt Lake City Health Care System, Salt Lake City, UT 84148, USA
| | - Yan Cheng
- Department of Clinical Research and Leadership, George Washington University, Washington, DC 20037, USA
- VA Salt Lake City Health Care System, Salt Lake City, UT 84148, USA
| | - Senait Tekle
- Department of Clinical Research and Leadership, George Washington University, Washington, DC 20037, USA
| | - Jennifer H Garvin
- VA Salt Lake City Health Care System, Salt Lake City, UT 84148, USA
- Department of Biomedical Informatics, University of Utah School of Medicine, University of Utah, Salt Lake City, UT 84112, USA
| | - Cynthia A Brandt
- VA Connecticut Healthcare System, West Haven, CT 06516, USA
- Department of Emergency Medicine, Yale School of Medicine, Yale University, New Haven, CT 06510, USA
| | - Qing Zeng-Treitler
- Center for Data Science and Outcome Research, Washington DC VA Medical Center, Washington, DC 20422, USA
- Department of Clinical Research and Leadership, George Washington University, Washington, DC 20037, USA
| |
Collapse
|
2
|
Lynch KE, Alba PR, Patterson OV, Viernes B, Coronado G, DuVall SL. The Utility of Clinical Notes for Sexual Minority Health Research. Am J Prev Med 2020; 59:755-763. [PMID: 33011005 DOI: 10.1016/j.amepre.2020.05.026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 05/19/2020] [Accepted: 05/26/2020] [Indexed: 02/05/2023]
Abstract
INTRODUCTION Despite improvements in electronic medical record capability to collect data on sexual orientation, not all healthcare systems have adopted this practice. This can limit the usability of systemwide electronic medical record data for sexual minority research. One viable resource might be the documentation of sexual orientation within clinical notes. The authors developed an approach to identify sexual orientation documentation and subsequently derived a cohort of sexual minority patients using clinical notes from the Veterans Health Administration electronic medical record. METHODS A hybrid natural language processing approach was developed and used to identify and categorize instances of terms and phrases related to sexual orientation in Veterans Health Administration clinical notes from 2000 to 2019. System performance was assessed with positive predictive value and sensitivity. Data were analyzed in 2019. RESULTS A total of 2,413,584 sexual minority terms/phrases were found within clinical notes, of which 439,039 (18%) were found to be related to patient sexual orientation with a positive predictive value of 85.9%. Documentation of sexual orientation was found for 115,312 patients. When compared with 2,262 patients with a record of administrative coding for homosexuality, the system found mentions of sexual orientation for 1,808 patients (79.9% sensitivity). CONCLUSIONS When systemwide structured data are unavailable or inconsistent, deriving a cohort of sexual minority patients in electronic medical records for research is possible and permits longitudinal analysis across multiple clinical domains. Although limitations and challenges to the approach were identified, this study makes an important step forward for the Veterans Health Administration sexual minority research, and the methodology can be applied in other healthcare organizations.
Collapse
Affiliation(s)
- Kristine E Lynch
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah; Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, Utah.
| | - Patrick R Alba
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah; Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, Utah
| | - Olga V Patterson
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah; Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, Utah
| | - Benjamin Viernes
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah; Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, Utah
| | - Gregorio Coronado
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah; Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, Utah
| | - Scott L DuVall
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah; Division of Epidemiology, Department of Internal Medicine, University of Utah, Salt Lake City, Utah
| |
Collapse
|
3
|
Venkataraman GR, Pineda AL, Bear Don’t Walk IV OJ, Zehnder AM, Ayyar S, Page RL, Bustamante CD, Rivas MA. FasTag: Automatic text classification of unstructured medical narratives. PLoS One 2020; 15:e0234647. [PMID: 32569327 PMCID: PMC7307763 DOI: 10.1371/journal.pone.0234647] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 05/30/2020] [Indexed: 02/07/2023] Open
Abstract
Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.
Collapse
Affiliation(s)
- Guhan Ram Venkataraman
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, United States of America
| | - Arturo Lopez Pineda
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, United States of America
| | - Oliver J. Bear Don’t Walk IV
- Department of Biomedical Informatics, Vagelos College of Physicians and Surgeons, Columbia University, New York, NY, United States of America
| | | | - Sandeep Ayyar
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, United States of America
| | - Rodney L. Page
- Department of Clinical Sciences, College of Veterinary Medicine and Biomedical Sciences, Colorado State University, Fort Collins, CO, United States of America
| | - Carlos D. Bustamante
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, United States of America
- Chan Zuckerberg Biohub, San Francisco, CA, United States of America
| | - Manuel A. Rivas
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
4
|
Walsh JA, Shao Y, Leng J, He T, Teng CC, Redd D, Treitler Zeng Q, Burningham Z, Clegg DO, Sauer BC. Identifying Axial Spondyloarthritis in Electronic Medical Records of US Veterans. Arthritis Care Res (Hoboken) 2017; 69:1414-1420. [PMID: 27813310 DOI: 10.1002/acr.23140] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 10/19/2016] [Accepted: 11/01/2016] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Large database research in axial spondyloarthritis (SpA) is limited by a lack of methods for identifying most types of axial SpA. Our objective was to develop methods for identifying axial SpA concepts in the free text of documents from electronic medical records. METHODS Veterans with documents in the national Veterans Health Administration Corporate Data Warehouse between January 1, 2005 and June 30, 2015 were included. Methods were developed for exploring, selecting, and extracting meaningful terms that were likely to represent axial SpA concepts. With annotation, clinical experts reviewed sections of text containing the meaningful terms (snippets) and classified the snippets according to whether or not they represented the intended axial SpA concept. With natural language processing (NLP) tools, computers were trained to replicate the clinical experts' snippet classifications. RESULTS Three axial SpA concepts were selected by clinical experts, including sacroiliitis, terms including the prefix spond*, and HLA-B27 positivity (HLA-B27+). With supervised machine learning on annotated snippets, NLP models were developed with accuracies of 91.1% for sacroiliitis, 93.5% for spond*, and 97.2% for HLA-B27+. With independent validation, the accuracies were 92.0% for sacroiliitis, 91.0% for spond*, and 99.0% for HLA-B27+. CONCLUSION We developed feasible and accurate methods for identifying axial SpA concepts in the free text of clinical notes. Additional research is required to determine combinations of concepts that will accurately identify axial SpA phenotypes. These novel methods will facilitate previously impractical observational research in axial SpA and may be applied to research with other diseases.
Collapse
Affiliation(s)
- Jessica A Walsh
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| | - Yijun Shao
- George E. Wahlen Veterans Affairs Medical Center, Salt Lake City, Utah, and George Washington University, Washington, DC
| | - Jianwei Leng
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| | - Tao He
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| | - Chia-Chen Teng
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| | - Doug Redd
- George E. Wahlen Veterans Affairs Medical Center, Salt Lake City, Utah, and George Washington University, Washington, DC
| | - Qing Treitler Zeng
- George E. Wahlen Veterans Affairs Medical Center, Salt Lake City, Utah, and George Washington University, Washington, DC
| | | | - Daniel O Clegg
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| | - Brian C Sauer
- George E. Wahlen Veterans Affairs Medical Center and University of Utah, Salt Lake City
| |
Collapse
|