Lee DY, Kim C, Lee S, Son SJ, Cho SM, Cho YH, Lim J, Park RW. Psychosis Relapse Prediction Leveraging Electronic Health Records Data and Natural Language Processing Enrichment Methods.
Front Psychiatry 2022;
13:844442. [PMID:
35479497 PMCID:
PMC9037331 DOI:
10.3389/fpsyt.2022.844442]
[Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 03/09/2022] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND
Identifying patients at a high risk of psychosis relapse is crucial for early interventions. A relevant psychiatric clinical context is often recorded in clinical notes; however, the utilization of unstructured data remains limited. This study aimed to develop psychosis-relapse prediction models using various types of clinical notes and structured data.
METHODS
Clinical data were extracted from the electronic health records of the Ajou University Medical Center in South Korea. The study population included patients with psychotic disorders, and outcome was psychosis relapse within 1 year. Using only structured data, we developed an initial prediction model, then three natural language processing (NLP)-enriched models using three types of clinical notes (psychological tests, admission notes, and initial nursing assessment) and one complete model. Latent Dirichlet Allocation was used to cluster the clinical context into similar topics. All models applied the least absolute shrinkage and selection operator logistic regression algorithm. We also performed an external validation using another hospital database.
RESULTS
A total of 330 patients were included, and 62 (18.8%) experienced psychosis relapse. Six predictors were used in the initial model and 10 additional topics from Latent Dirichlet Allocation processing were added in the enriched models. The model derived from all notes showed the highest value of the area under the receiver operating characteristic (AUROC = 0.946) in the internal validation, followed by models based on the psychological test notes, admission notes, initial nursing assessments, and structured data only (0.902, 0.855, 0.798, and 0.784, respectively). The external validation was performed using only the initial nursing assessment note, and the AUROC was 0.616.
CONCLUSIONS
We developed prediction models for psychosis relapse using the NLP-enrichment method. Models using clinical notes were more effective than models using only structured data, suggesting the importance of unstructured data in psychosis prediction.
Collapse