1
|
Maghsoudi A, Sada YH, Nowakowski S, Guffey D, Zhu H, Yarlagadda SR, Li A, Razjouyan J. A Multi-Institutional Natural Language Processing Pipeline to Extract Performance Status From Electronic Health Records. Cancer Control 2024; 31:10732748241279518. [PMID: 39222957 PMCID: PMC11369884 DOI: 10.1177/10732748241279518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/04/2024] Open
Abstract
PURPOSE Performance status (PS), an essential indicator of patients' functional abilities, is often documented in clinical notes of patients with cancer. The use of natural language processing (NLP) in extracting PS from electronic medical records (EMRs) has shown promise in enhancing clinical decision-making, patient monitoring, and research studies. We designed and validated a multi-institute NLP pipeline to automatically extract performance status from free-text patient notes. PATIENTS AND METHODS We collected data from 19,481 patients in Harris Health System (HHS) and 333,862 patients from veteran affair's corporate data warehouse (VA-CDW) and randomly selected 400 patients from each data source to train and validate (50%) and test (50%) the proposed pipeline. We designed an NLP pipeline using an expert-derived rule-based approach in conjunction with extensive post-processing to solidify its proficiency. To demonstrate the pipeline's application, we tested the compliance of PS documentation suggested by the American Society of Clinical Oncology (ASCO) Quality Metric and investigated the potential disparity in PS reporting for stage IV non-small cell lung cancer (NSCLC). We used a logistic regression test, considering patients in terms of race/ethnicity, conversing language, marital status, and gender. RESULTS The test results on the HHS cohort showed 92% accuracy, and on VA data demonstrated 98.5% accuracy. For stage IV NSCLC patients, the proposed pipeline achieved an accuracy of 98.5%. Furthermore, our analysis revealed a documentation rate of over 85% for PS among NSCLC patients, surpassing the ASCO Quality Metrics. No disparities were observed in the documentation of PS. CONCLUSION Our proposed NLP pipeline shows promising results in extracting PS from free-text notes from various health institutions. It may be used in longitudinal cancer data registries.
Collapse
Affiliation(s)
- Arash Maghsoudi
- Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, TX, USA
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Yvonne H. Sada
- Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, TX, USA
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Sara Nowakowski
- Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, TX, USA
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Danielle Guffey
- Section of Hematology-Oncology, Baylor College of Medicine, Houston, TX, USA
| | - Huili Zhu
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | | | - Ang Li
- Section of Hematology-Oncology, Baylor College of Medicine, Houston, TX, USA
| | - Javad Razjouyan
- Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, TX, USA
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA
- Big Data Scientist Training Enhancement Program (BD-STEP), VA Office of Research and Development, Washington, DC, USA
| |
Collapse
|
2
|
Javed A, Rizzo DM, Lee BS, Gramling R. Somtimes: self organizing maps for time series clustering and its application to serious illness conversations. Data Min Knowl Discov 2023; 38:813-839. [PMID: 38711534 PMCID: PMC11069464 DOI: 10.1007/s10618-023-00979-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 08/22/2023] [Indexed: 05/08/2024]
Abstract
There is demand for scalable algorithms capable of clustering and analyzing large time series data. The Kohonen self-organizing map (SOM) is an unsupervised artificial neural network for clustering, visualizing, and reducing the dimensionality of complex data. Like all clustering methods, it requires a measure of similarity between input data (in this work time series). Dynamic time warping (DTW) is one such measure, and a top performer that accommodates distortions when aligning time series. Despite its popularity in clustering, DTW is limited in practice because the runtime complexity is quadratic with the length of the time series. To address this, we present a new a self-organizing map for clustering TIME Series, called SOMTimeS, which uses DTW as the distance measure. The method has similar accuracy compared with other DTW-based clustering algorithms, yet scales better and runs faster. The computational performance stems from the pruning of unnecessary DTW computations during the SOM's training phase. For comparison, we implement a similar pruning strategy for K-means, and call the latter K-TimeS. SOMTimeS and K-TimeS pruned 43% and 50% of the total DTW computations, respectively. Pruning effectiveness, accuracy, execution time and scalability are evaluated using 112 benchmark time series datasets from the UC Riverside classification archive, and show that for similar accuracy, a 1.8× speed-up on average for SOMTimeS and K-TimeS, respectively with that rates vary between 1× and 18× depending on the dataset. We also apply SOMTimeS to a healthcare study of patient-clinician serious illness conversations to demonstrate the algorithm's utility with complex, temporally sequenced natural language. Supplementary Information The online version contains supplementary material available at 10.1007/s10618-023-00979-9.
Collapse
Affiliation(s)
- Ali Javed
- Department of Medicine, Stanford University, 300 Pasteur Dr, Stanford, CA 94305 USA
- Department of Computer Science, University of Vermont, Burlington, VT USA
| | - Donna M. Rizzo
- Department of Civil and Environmental Engineering, University of Vermont, Burlington, VT USA
- Department of Computer Science, University of Vermont, Burlington, VT USA
| | - Byung Suk Lee
- Department of Computer Science, University of Vermont, Burlington, VT USA
| | - Robert Gramling
- Department of Family Medicine, University of Vermont, Burlington, VT USA
| |
Collapse
|
3
|
Sarmet M, Kabani A, Coelho L, Dos Reis SS, Zeredo JL, Mehta AK. The use of natural language processing in palliative care research: A scoping review. Palliat Med 2023; 37:275-290. [PMID: 36495082 DOI: 10.1177/02692163221141969] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
BACKGROUND Natural language processing has been increasingly used in palliative care research over the last 5 years for its versatility and accuracy. AIM To evaluate and characterize natural language processing use in palliative care research, including the most commonly used natural language processing software and computational methods, data sources, trends in natural language processing use over time, and palliative care topics addressed. DESIGN A scoping review using the framework by Arksey and O'Malley and the updated recommendations proposed by Levac et al. was conducted. SOURCES PubMed, Web of Science, Embase, Scopus, and IEEE Xplore databases were searched for palliative care studies that utilized natural language processing tools. Data on study characteristics and natural language processing instruments used were collected and relevant palliative care topics were identified. RESULTS 197 relevant references were identified. Of these, 82 were included after full-text review. Studies were published in 48 different journals from 2007 to 2022. The average sample size was 21,541 (median 435). Thirty-two different natural language processing software and 33 machine-learning methods were identified. Nine main sources for data processing and 15 main palliative care topics across the included studies were identified. The most frequent topic was mortality and prognosis prediction. We also identified a trend where natural language processing was frequently used in analyzing clinical serious illness conversations extracted from audio recordings. CONCLUSIONS We found 82 papers on palliative care using natural language processing methods for a wide-range of topics and sources of data that could expand the use of this methodology. We encourage researchers to consider incorporating this cutting-edge research methodology in future studies to improve published palliative care data.
Collapse
Affiliation(s)
- Max Sarmet
- Tertiary Referral Center of Neuromuscular Diseases, Hospital de Apoio de Brasília, Brazil.,Graduate Department of Health Science and Technology, University of Brasília, Brazil
| | - Aamna Kabani
- Johns Hopkins University, School of Medicine, USA
| | - Luis Coelho
- Center of Innovation in Engineering and Industrial Technology, Polytechnic of Porto - School of Engineering (ISEP), Portugal
| | - Sara Seabra Dos Reis
- Center of Innovation in Engineering and Industrial Technology, Polytechnic of Porto - School of Engineering (ISEP), Portugal
| | - Jorge L Zeredo
- Graduate Department of Health Science and Technology, University of Brasília, Brazil
| | - Ambereen K Mehta
- Palliative Care Program, Division of General Internal Medicine, Johns Hopkins Bayview Medical Center, Johns Hopkins University, School of Medicine, USA
| |
Collapse
|
4
|
Tarbi EC, Blanch-Hartigan D, van Vliet LM, Gramling R, Tulsky JA, Sanders JJ. Toward a basic science of communication in serious illness. PATIENT EDUCATION AND COUNSELING 2022; 105:1963-1969. [PMID: 35410737 DOI: 10.1016/j.pec.2022.03.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 03/09/2022] [Accepted: 03/24/2022] [Indexed: 06/14/2023]
Abstract
High-quality communication can mitigate suffering during serious illness. Innovations in theory and technology present the opportunity to advance serious illness communication research, moving beyond inquiry that links broad communication constructs to health outcomes toward operationalizing and understanding the impact of discrete communication functions on human experience. Given the high stakes of communication during serious illness, we see a critical need to develop a basic science approach to serious illness communication research. Such an approach seeks to link "what actually happens during a conversation" - the lexical and non-lexical communication content elements, as well as contextual factors - with the emotional and cognitive experiences of patients, caregivers, and clinicians. This paper defines and justifies a basic science approach to serious illness communication research and outlines investigative and methodological opportunities in this area. A systematic understanding of the building blocks of serious illness communication can help identify evidence-informed communication strategies that promote positive patient outcomes, shape more targeted communication skills training for clinicians, and lead to more tailored and meaningful serious illness care.
Collapse
Affiliation(s)
- Elise C Tarbi
- Dana-Farber Cancer Institute, Department of Psychosocial Oncology and Palliative Care, Boston, USA.
| | | | | | - Robert Gramling
- University of Vermont. Department of Family Medicine, Burlington, USA.
| | - James A Tulsky
- Dana-Farber Cancer Institute, Department of Psychosocial Oncology and Palliative Care, Boston, USA; Brigham and Women's Hospital, Division of Palliative Medicine, Department of Medicine, Boston, USA.
| | - Justin J Sanders
- McGill University, Division of Palliative Care, Department of Family Medicine, Montreal, Canada.
| |
Collapse
|
5
|
Han PKJ. Medical uncertainty: putting flesh on the bones. PATIENT EDUCATION AND COUNSELING 2021; 104:2603-2605. [PMID: 34666906 PMCID: PMC8520411 DOI: 10.1016/j.pec.2021.09.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Affiliation(s)
- Paul K J Han
- Behavioral Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.
| |
Collapse
|