Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Fonferko-Shadrach B, Lacey AS, Roberts A, Akbari A, Thompson S, Ford DV, Lyons RA, Rees MI, Pickrell WO. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system. BMJ Open 2019;9:e023232. [PMID: 30940752 PMCID: PMC6500195 DOI: 10.1136/bmjopen-2018-023232] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Revised: 01/23/2019] [Accepted: 02/13/2019] [Indexed: 11/24/2022] Open

For:	Fonferko-Shadrach B, Lacey AS, Roberts A, Akbari A, Thompson S, Ford DV, Lyons RA, Rees MI, Pickrell WO. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system. BMJ Open 2019;9:e023232. [PMID: 30940752 PMCID: PMC6500195 DOI: 10.1136/bmjopen-2018-023232] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Revised: 01/23/2019] [Accepted: 02/13/2019] [Indexed: 11/24/2022] Open

Number

Cited by Other Article(s)

Fonferko-Shadrach B, Strafford H, Jones C, Khan RA, Brown S, Edwards J, Hawken J, Shrimpton LE, White CP, Powell R, Sawhney IMS, Pickrell WO, Lacey AS. Annotation of epilepsy clinic letters for natural language processing. J Biomed Semantics 2024;15:17. [PMID: 39277770 PMCID: PMC11402197 DOI: 10.1186/s13326-024-00316-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 07/22/2024] [Indexed: 09/17/2024] Open

Abstract

BACKGROUND

Natural language processing (NLP) is increasingly being used to extract structured information from unstructured text to assist clinical decision-making and aid healthcare research. The availability of expert-annotated documents for the development and validation of NLP applications is limited. We created synthetic clinical documents to address this, and to validate the Extraction of Epilepsy Clinical Text version 2 (ExECTv2) NLP pipeline.

METHODS

We created 200 synthetic clinic letters based on hospital outpatient consultations with epilepsy specialists. The letters were double annotated by trained clinicians and researchers according to agreed guidelines. We used the annotation tool, Markup, with an epilepsy concept list based on the Unified Medical Language System ontology. All annotations were reviewed, and a gold standard set of annotations was agreed and used to validate the performance of ExECTv2.

RESULTS

The overall inter-annotator agreement (IAA) between the two sets of annotations produced a per item F1 score of 0.73. Validating ExECTv2 using the gold standard gave an overall F1 score of 0.87 per item, and 0.90 per letter.

CONCLUSION

The synthetic letters, annotations, and annotation guidelines have been made freely available. To our knowledge, this is the first publicly available set of annotated epilepsy clinic letters and guidelines that can be used for NLP researchers with minimum epilepsy knowledge. The IAA results show that clinical text annotation tasks are difficult and require a gold standard to be arranged by researcher consensus. The results for ExECTv2, our automated epilepsy NLP pipeline, extracted detailed epilepsy information from unstructured epilepsy letters with more accuracy than human annotators, further confirming the utility of NLP for clinical and research applications.

Collapse

Fernandes M, Cardall A, Moura LM, McGraw C, Zafar SF, Westover MB. Extracting seizure control metrics from clinic notes of patients with epilepsy: A natural language processing approach. Epilepsy Res 2024;207:107451. [PMID: 39276641 DOI: 10.1016/j.eplepsyres.2024.107451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 07/17/2024] [Accepted: 09/09/2024] [Indexed: 09/17/2024]

Abstract

OBJECTIVES

Monitoring seizure control metrics is key to clinical care of patients with epilepsy. Manually abstracting these metrics from unstructured text in electronic health records (EHR) is laborious. We aimed to abstract the date of last seizure and seizure frequency from clinical notes of patients with epilepsy using natural language processing (NLP).

METHODS

We extracted seizure control metrics from notes of patients seen in epilepsy clinics from two hospitals in Boston. Extraction was performed with the pretrained model RoBERTa_for_seizureFrequency_QA, for both date of last seizure and seizure frequency, combined with regular expressions. We designed the algorithm to categorize the timing of last seizure ("today", "1-6 days ago", "1-4 weeks ago", "more than 1-3 months ago", "more than 3-6 months ago", "more than 6-12 months ago", "more than 1-2 years ago", "more than 2 years ago") and seizure frequency ("innumerable", "multiple", "daily", "weekly", "monthly", "once per year", "less than once per year"). Our ground truth consisted of structured questionnaires filled out by physicians. Model performance was measured using the areas under the receiving operating characteristic curve (AUROC) and precision recall curve (AUPRC) for categorical labels, and median absolute error (MAE) for ordinal labels, with 95 % confidence intervals (CI) estimated via bootstrapping.

RESULTS

Our cohort included 1773 adult patients with a total of 5658 visits with reported seizure control metrics, seen in epilepsy clinics between December 2018 and May 2022. The cohort average age was 42 years old, the majority were female (57 %), White (81 %) and non-Hispanic (85 %). The models achieved an MAE (95 % CI) for date of last seizure of 4 (4.00-4.86) weeks, and for seizure frequency of 0.02 (0.02-0.02) seizures per day.

CONCLUSIONS

Our NLP approach demonstrates that the extraction of seizure control metrics from EHR is feasible allowing for large-scale EHR research.

Collapse

Galer PD, Parthasarathy S, Xian J, McKee JL, Ruggiero SM, Ganesan S, Kaufman MC, Cohen SR, Haag S, Chen C, Ojemann WKS, Kim D, Wilmarth O, Vaidiswaran P, Sederman C, Ellis CA, Gonzalez AK, Boßelmann CM, Lal D, Sederman R, Lewis-Smith D, Litt B, Helbig I. Clinical signatures of genetic epilepsies precede diagnosis in electronic medical records of 32,000 individuals. Genet Med 2024;26:101211. [PMID: 39011766 DOI: 10.1016/j.gim.2024.101211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 07/10/2024] [Accepted: 07/10/2024] [Indexed: 07/17/2024] Open

Affiliation(s)

Peter D Galer Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA; University of Pennsylvania, Center for Neuroengineering and Therapeutics, Philadelphia, PA
Shridhar Parthasarathy Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA
Julie Xian Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA
Jillian L McKee Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA; Department of Neurology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
Sarah M Ruggiero Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA
Shiva Ganesan Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA
Michael C Kaufman Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA
Stacey R Cohen Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA
Scott Haag Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA
Chen Chen Ambit Inc, Morristown, NJ
William K S Ojemann University of Pennsylvania, Center for Neuroengineering and Therapeutics, Philadelphia, PA
Dan Kim Ambit Inc, Morristown, NJ
Olivia Wilmarth Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA
Priya Vaidiswaran Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA
Casey Sederman Department of Human Genetics, University of Utah, Salt Lake City, UT; Utah Center for Genetic Discovery, School of Medicine, University of Utah, Salt Lake City, UT
Colin A Ellis The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA; Department of Neurology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
Alexander K Gonzalez Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA
Christian M Boßelmann Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH
Dennis Lal Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH; Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany
Rob Sederman Ambit Inc, Morristown, NJ
David Lewis-Smith Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA; Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne, UK; Newcastle Upon Tyne Hospitals NHS Foundation Trust, Newcastle-upon-Tyne, UK; FutureNeuro SFI Research Centre, RCSI University of Medicine and Health Sciences, Dublin 2, Ireland
Brian Litt University of Pennsylvania, Center for Neuroengineering and Therapeutics, Philadelphia, PA; Department of Neurology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
Ingo Helbig Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA; Department of Neurology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA.

Collapse

van Diessen E, van Amerongen RA, Zijlmans M, Otte WM. Potential merits and flaws of large language models in epilepsy care: A critical review. Epilepsia 2024;65:873-886. [PMID: 38305763 DOI: 10.1111/epi.17907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/30/2023] [Accepted: 01/19/2024] [Indexed: 02/03/2024]

Tsai AY, Carter SR, Greene AC. Artificial intelligence in pediatric surgery. Semin Pediatr Surg 2024;33:151390. [PMID: 38242061 DOI: 10.1016/j.sempedsurg.2024.151390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2024]

Mora S, Turrisi R, Chiarella L, Consales A, Tassi L, Mai R, Nobili L, Barla A, Arnulfo G. NLP-based tools for localization of the epileptogenic zone in patients with drug-resistant focal epilepsy. Sci Rep 2024;14:2349. [PMID: 38287042 PMCID: PMC10825198 DOI: 10.1038/s41598-024-51846-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 01/10/2024] [Indexed: 01/31/2024] Open

Msosa YJ, Grauslys A, Zhou Y, Wang T, Buchan I, Langan P, Foster S, Walker M, Pearson M, Folarin A, Roberts A, Maskell S, Dobson R, Kullu C, Kehoe D. Trustworthy Data and AI Environments for Clinical Prediction: Application to Crisis-Risk in People With Depression. IEEE J Biomed Health Inform 2023;27:5588-5598. [PMID: 37669205 DOI: 10.1109/jbhi.2023.3312011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]

Abstract

Depression is a common mental health condition that often occurs in association with other chronic illnesses, and varies considerably in severity. Electronic Health Records (EHRs) contain rich information about a patient's medical history and can be used to train, test and maintain predictive models to support and improve patient care. This work evaluated the feasibility of implementing an environment for predicting mental health crisis among people living with depression based on both structured and unstructured EHRs. A large EHR from a mental health provider, Mersey Care, was pseudonymised and ingested into the Natural Language Processing (NLP) platform CogStack, allowing text content in binary clinical notes to be extracted. All unstructured clinical notes and summaries were semantically annotated by MedCAT and BioYODIE NLP services. Cases of crisis in patients with depression were then identified. Random forest models, gradient boosting trees, and Long Short-Term Memory (LSTM) networks, with varying feature arrangement, were trained to predict the occurrence of crisis. The results showed that all the prediction models can use a combination of structured and unstructured EHR information to predict crisis in patients with depression with good and useful accuracy. The LSTM network that was trained on a modified dataset with only 1000 most-important features from the random forest model with temporality showed the best performance with a mean AUC of 0.901 and a standard deviation of 0.006 using a training dataset and a mean AUC of 0.810 and 0.01 using a hold-out test dataset. Comparing the results from the technical evaluation with the views of psychiatrists shows that there are now opportunities to refine and integrate such prediction models into pragmatic point-of-care clinical decision support tools for supporting mental healthcare delivery.

Collapse

Vulpius SA, Werge S, Jørgensen IF, Siggaard T, Hernansanz Biel J, Knudsen GM, Brunak S, Pinborg LH. Text mining of electronic health records can validate a register-based diagnosis of epilepsy and subgroup into focal and generalized epilepsy. Epilepsia 2023;64:2750-2760. [PMID: 37548470 DOI: 10.1111/epi.17734] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 08/01/2023] [Accepted: 08/01/2023] [Indexed: 08/08/2023]

Abstract

OBJECTIVE

Combining population-based health registries and electronic health records offers the opportunity to create large, phenotypically detailed patient cohorts of high quality. In this study, we used text mining of clinical notes to confirm International Classification of Diseases, 10th Revision (ICD-10)-registered epilepsy diagnoses and classify patients according to focal and generalized epilepsy types.

METHODS

Using the Danish National Patient Registry, we identified patients who between 2006 and 2016 received an ICD-10 diagnosis of epilepsy. To validate the epilepsy diagnosis and stratify patients into focal and generalized epilepsy types, we constructed dictionaries for text mining-based extraction of clinical notes. Two physicians manually reviewed the clinical notes for a total of 527 patients and assigned epilepsy diagnoses, which were compared with the text-mined diagnoses.

RESULTS

We identified 23 632 patients with an ICD-10 diagnosis of epilepsy, of whom 50% were registered with an unspecified epilepsy diagnosis. In total, 11 211 patients were considered likely to have epilepsy by text mining, with an F1 measure ranging from 82% to 90%. Manual review of the electronic health records for 310 patients revealed a false discovery rate of 29%. This rate was decreased to 4% by the text mining algorithm. The weighted average F1 measure for text mining-assigned epilepsy types was 79% (82% for focal and 76% for generalized epilepsy). Text mining successfully assigned a focal or generalized epilepsy type to 92% of the text mining-eligible patients registered with unspecified epilepsy.

SIGNIFICANCE

Text mining of electronic health records can be used to establish a patient cohort with much higher likelihood of having a diagnosis of epilepsy and a focal or generalized epilepsy type compared to the cohort created from ICD-10 epilepsy codes alone. We believe the concept will be essential for future genome-wide and phenome-wide association studies and subsequently the development of precision medicine for epilepsy patients.

Collapse

Bosch D, Kuppen MCP, Tascilar M, Smilde TJ, Mulders PFA, Uyl-de Groot CA, van Oort IM. Reliability and Efficiency of the CAPRI-3 Metastatic Prostate Cancer Registry Driven by Artificial Intelligence. Cancers (Basel) 2023;15:3808. [PMID: 37568624 PMCID: PMC10417512 DOI: 10.3390/cancers15153808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/19/2023] [Accepted: 07/23/2023] [Indexed: 08/13/2023] Open

Xie K, Gallagher RS, Shinohara RT, Xie SX, Hill CE, Conrad EC, Davis KA, Roth D, Litt B, Ellis CA. Long-term epilepsy outcome dynamics revealed by natural language processing of clinic notes. Epilepsia 2023;64:1900-1909. [PMID: 37114472 PMCID: PMC10523917 DOI: 10.1111/epi.17633] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/26/2023] [Accepted: 04/26/2023] [Indexed: 04/29/2023]

Abstract

OBJECTIVE

Electronic medical records allow for retrospective clinical research with large patient cohorts. However, epilepsy outcomes are often contained in free text notes that are difficult to mine. We recently developed and validated novel natural language processing (NLP) algorithms to automatically extract key epilepsy outcome measures from clinic notes. In this study, we assessed the feasibility of extracting these measures to study the natural history of epilepsy at our center.

METHODS

We applied our previously validated NLP algorithms to extract seizure freedom, seizure frequency, and date of most recent seizure from outpatient visits at our epilepsy center from 2010 to 2022. We examined the dynamics of seizure outcomes over time using Markov model-based probability and Kaplan-Meier analyses.

RESULTS

Performance of our algorithms on classifying seizure freedom was comparable to that of human reviewers (algorithm F1 = .88 vs. human annotatorκ = .86). We extracted seizure outcome data from 55 630 clinic notes from 9510 unique patients written by 53 unique authors. Of these, 30% were classified as seizure-free since the last visit, 48% of non-seizure-free visits contained a quantifiable seizure frequency, and 47% of all visits contained the date of most recent seizure occurrence. Among patients with at least five visits, the probabilities of seizure freedom at the next visit ranged from 12% to 80% in patients having seizures or seizure-free at the prior three visits, respectively. Only 25% of patients who were seizure-free for 6 months remained seizure-free after 10 years.

SIGNIFICANCE

Our findings demonstrate that epilepsy outcome measures can be extracted accurately from unstructured clinical note text using NLP. At our tertiary center, the disease course often followed a remitting and relapsing pattern. This method represents a powerful new tool for clinical research with many potential uses and extensions to other clinical questions.

Collapse

Affiliation(s)

Kevin Xie Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, 19104, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA
Ryan S. Gallagher Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA Department of Neurology, University of Pennsylvania, Philadelphia, PA, 19104, USA
Russell T. Shinohara Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA, 19104, USA
Sharon X. Xie Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA
Chloe E. Hill Department of Neurology, University of Michigan, Ann Arbor, MI, 48109, USA
Erin C. Conrad Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA Department of Neurology, University of Pennsylvania, Philadelphia, PA, 19104, USA
Kathryn A. Davis Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA Department of Neurology, University of Pennsylvania, Philadelphia, PA, 19104, USA
Dan Roth Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
Brian Litt Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, 19104, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA Department of Neurology, University of Pennsylvania, Philadelphia, PA, 19104, USA
Colin A. Ellis Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104, USA Department of Neurology, University of Pennsylvania, Philadelphia, PA, 19104, USA

Collapse

Fernandes M, Cardall A, Jing J, Ge W, Moura LMVR, Jacobs C, McGraw C, Zafar SF, Westover MB. Identification of patients with epilepsy using automated electronic health records phenotyping. Epilepsia 2023;64:1472-1481. [PMID: 36934317 PMCID: PMC10239346 DOI: 10.1111/epi.17589] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 03/20/2023]

Abstract

OBJECTIVE

Unstructured data present in electronic health records (EHR) are a rich source of medical information; however, their abstraction is labor intensive. Automated EHR phenotyping (AEP) can reduce the need for manual chart review. We present an AEP model that is designed to automatically identify patients diagnosed with epilepsy.

METHODS

The ground truth for model training and evaluation was captured from a combination of structured questionnaires filled out by physicians for a subset of patients and manual chart review using customized software. Modeling features included indicators of the presence of keywords and phrases in unstructured clinical notes, prescriptions for antiseizure medications (ASMs), International Classification of Diseases (ICD) codes for seizures and epilepsy, number of ASMs and epilepsy-related ICD codes, age, and sex. Data were randomly divided into training (70%) and hold-out testing (30%) sets, with distinct patients in each set. We trained regularized logistic regression and an extreme gradient boosting models. Model performance was measured using area under the receiver operating curve (AUROC) and area under the precision-recall curve (AUPRC), with 95% confidence intervals (CI) estimated via bootstrapping.

RESULTS

Our study cohort included 3903 adults drawn from outpatient departments of nine hospitals between February 2015 and June 2022 (mean age = 47 ± 18 years, 57% women, 82% White, 84% non-Hispanic, 70% with epilepsy). The final models included 285 features, including 246 keywords and phrases captured from 8415 encounters. Both models achieved AUROC and AUPRC of 1 (95% CI = .99-1.00) in the hold-out testing set.

SIGNIFICANCE

A machine learning-based AEP approach accurately identifies patients with epilepsy from notes, ICD codes, and ASMs. This model can enable large-scale epilepsy research using EHR databases.

Collapse

Affiliation(s)

Marta Fernandes Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA Clinical Data Animation Center, Massachusetts General Hospital, Boston, Massachusetts, USA Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, Massachusetts, USA
Aidan Cardall Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA Clinical Data Animation Center, Massachusetts General Hospital, Boston, Massachusetts, USA
Jin Jing Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA Clinical Data Animation Center, Massachusetts General Hospital, Boston, Massachusetts, USA Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, Massachusetts, USA
Wendong Ge Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA Clinical Data Animation Center, Massachusetts General Hospital, Boston, Massachusetts, USA Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, Massachusetts, USA
Lidia M. V. R. Moura Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA
Claire Jacobs Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA
Christopher McGraw Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA
Sahar F. Zafar Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA
M. Brandon Westover Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA Harvard Medical School, Boston, Massachusetts, USA Clinical Data Animation Center, Massachusetts General Hospital, Boston, Massachusetts, USA Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, Massachusetts, USA

Collapse

Spalding WM, Bertoia ML, Bulik CM, Seeger JD. Treatment characteristics among patients with binge-eating disorder: an electronic health records analysis. Postgrad Med 2023;135:254-264. [PMID: 35037815 DOI: 10.1080/00325481.2021.2018255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Abstract

OBJECTIVES

Treatment for adults diagnosed with binge-eating disorder (BED) includes psychotherapy and/or pharmacotherapy and aims to reduce the frequency of binge-eating episodes and disordered eating, improve metabolic-related issues and reduce weight, and address mood symptoms. Data describing real-world treatment patterns are lacking; therefore, this study aims to characterize real-world treatment patterns among patients with BED.

METHODS

This retrospective study identified adult patients with BED using natural language processing of clinical notes from the Optum electronic health record database from 2009 to 2015. Treatment patterns were examined during the 12 months preceding the BED recognition date and during a follow-up period after BED recognition (1-3 years for most patients).

RESULTS

Among 1042 patients, 384 were categorized as the BED cohort and 658, who met less stringent criteria, were categorized as probable BED. In the BED cohort, mean ± SD age was 45.2 ± 13.4 years and 81.8% were women (probable BED, 45.9 ± 12.8 years, 80.2%). A greater percentage of patients in the BED cohort were prescribed pharmacotherapy (70.6% [probable BED, 66.9%]) than received/discussed psychotherapy (53.1% [probable BED, 39.2%]) at baseline. In the BED cohort, 54.4% of patients were prescribed antidepressants (probable BED, 52.4%), 25.3% stimulants (probable BED, 20.1%), and 34.4% nonspecific psychotherapy (probable BED, 24.6%) at baseline, with no substantive differences observed during follow-up. Low percentages of patients in the BED cohort received/discussed cognitive behavioral therapy at baseline (12.5% [probable BED, 9.0%) or during follow-up (13.0% [probable BED, 8.8%). Among patients with ≥1 psychotherapy visit, the mean ± SD number of visits in the BED cohort was 1.2 ± 5.9 at baseline (probable BED, 1.7 ± 7.3) and 2.2 ± 7.7 during follow-up (probable BED, 2.6 ± 7.7).

CONCLUSION

This cohort of patients with BED was treated more frequently with pharmacotherapy than psychotherapy. These data may help inform strategies for reducing differences between real-world treatment patterns and evidence-based recommendations.

Collapse

Wong S, Simmons A, Rivera-Villicana J, Barnett S, Sivathamboo S, Perucca P, Ge Z, Kwan P, Kuhlmann L, Vasa R, Mouzakis K, O'Brien TJ. EEG datasets for seizure detection and prediction- A review. Epilepsia Open 2023. [PMID: 36740244 DOI: 10.1002/epi4.12704] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 01/28/2023] [Indexed: 02/07/2023] Open

Affiliation(s)

Sheng Wong Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
Anj Simmons Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
Jessica Rivera-Villicana Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
Scott Barnett Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
Shobi Sivathamboo Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, Victoria, Australia.,Department of Neurology, The Royal Melbourne Hospital, Parkville, Victoria, Australia.,Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Victoria, Australia.,Department of Neurology, Alfred Health, Melbourne, Victoria, Australia
Piero Perucca Department of Neurology, The Royal Melbourne Hospital, Parkville, Victoria, Australia.,Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Victoria, Australia.,Department of Neurology, Alfred Health, Melbourne, Victoria, Australia.,Department of Medicine, Austin Health, The University of Melbourne, Heidelberg, Victoria, Australia.,Comprehensive Epilepsy Program, Austin Health, Heidelberg, Victoria, Australia
Zongyuan Ge Monash eResearch Centre, Monash University, Clayton, Victoria, Australia
Patrick Kwan Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Victoria, Australia.,Department of Neurology, Alfred Health, Melbourne, Victoria, Australia
Levin Kuhlmann Department of Data Science and AI, Faculty of IT, Monash University, Clayton, Victoria, Australia.,Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Victoria, Australia
Rajesh Vasa Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
Kon Mouzakis Applied Artificial Intelligence Institute, Deakin University, Burwood, Victoria, Australia
Terence J O'Brien Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, Victoria, Australia.,Department of Neurology, The Royal Melbourne Hospital, Parkville, Victoria, Australia.,Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Victoria, Australia.,Department of Neurology, Alfred Health, Melbourne, Victoria, Australia

Collapse

Yew ANJ, Schraagen M, Otte WM, van Diessen E. Transforming epilepsy research: A systematic review on natural language processing applications. Epilepsia 2023;64:292-305. [PMID: 36462150 PMCID: PMC10108221 DOI: 10.1111/epi.17474] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/23/2022] [Accepted: 12/01/2022] [Indexed: 12/05/2022]

Juang WC, Hsu MH, Cai ZX, Chen CM. Developing an AI-assisted clinical decision support system to enhance in-patient holistic health care. PLoS One 2022;17:e0276501. [PMID: 36315554 PMCID: PMC9621444 DOI: 10.1371/journal.pone.0276501] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 10/08/2022] [Indexed: 11/06/2022] Open

Decker BM, Turco A, Xu J, Terman SW, Kosaraju N, Jamil A, Davis KA, Litt B, Ellis CA, Khankhanian P, Hill CE. Development of a natural language processing algorithm to extract seizure types and frequencies from the electronic health record. Seizure 2022;101:48-51. [PMID: 35882104 PMCID: PMC9547963 DOI: 10.1016/j.seizure.2022.07.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 07/16/2022] [Accepted: 07/18/2022] [Indexed: 11/21/2022] Open

Fernandes M, Donahue MA, Hoch D, Cash S, Zafar S, Jacobs C, Hosford M, Voinescu PE, Fureman B, Buchhalter J, McGraw CM, Westover MB, Moura LMVR. A replicable, open-source, data integration method to support national practice-based research & quality improvement systems. Epilepsy Res 2022;186:107013. [PMID: 35994859 PMCID: PMC9810436 DOI: 10.1016/j.eplepsyres.2022.107013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Revised: 04/28/2022] [Accepted: 08/13/2022] [Indexed: 01/07/2023]

Abstract

OBJECTIVES

The Epilepsy Learning Healthcare System (ELHS) was created in 2018 to address measurable improvements in outcomes for people with epilepsy. However, fragmentation of data systems has been a major barrier for reporting and participation. In this study, we aimed to test the feasibility of an open-source Data Integration (DI) method that connects real-life clinical data to national research and quality improvement (QI) systems.

METHODS

The ELHS case report forms were programmed as EPIC SmartPhrases at Mass General Brigham (MGB) in December 2018 and subsequently as EPIC SmartForms in June 2021 to collect actionable, standardized, structured epilepsy data in the electronic health record (EHR) for subsequent pull into the external national registry of the ELHS. Following the QI methodology in the Chronic Care Model, 39 providers, epileptologists and neurologists, incorporated the ELHS SmartPhrase into their clinical workflow, focusing on collecting diagnosis of epilepsy, seizure type according to the International League Against Epilepsy, seizure frequency, date of last seizure, medication adherence and side effects. The collected data was stored in the Enterprise Data Warehouse (EDW) without integration with external systems. We developed and validated a DI method that extracted the data from EDW using structured query language and later preprocessed using text mining. We used the ELHS data dictionary to match fields in the preprocessed notes to obtain the final structured dataset with seizure control information. For illustration, we described the data curated from the care period of 12/2018-12/2021.

RESULTS

The cohort comprised a total of 1806 patients with a mean age of 43 years old (SD: 17.0), where 57% were female, 80% were white, and 84% were non-Hispanic/Latino. Using our DI method, we automated the data mining, preprocessing, and exporting of the structured dataset into a local database, to be weekly accessible to clinicians and quality improvers. During the period of SmartPhrase implementation, there were 5168 clinic visits logged by providers documenting each patient's seizure type and frequency. During this period, providers documented 59% patients having focal seizures, 35% having generalized seizures and 6% patients having another type. Of the cohort, 45% patients had private insurance. The resulting structured dataset was bulk uploaded via web interface into the external national registry of the ELHS.

CONCLUSIONS

Structured data can be feasibly extracted from text notes of epilepsy patients for weekly reporting to a national learning healthcare system.

Collapse

Affiliation(s)

Marta Fernandes Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States; Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States.
Maria A Donahue Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States; The NeuroValue Lab, MGH, Boston, MA, United States.
Dan Hoch Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
Sydney Cash Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
Sahar Zafar Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
Claire Jacobs Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
Mackenzie Hosford Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
P Emanuela Voinescu Harvard Medical School, Boston, MA, United States; Department of Neurology, Division of Epilepsy, Division of Women's Health, Brigham and Women's Hospital, Boston, MA, United States.
Brandy Fureman Epilepsy Foundation of America, United States.
Jeffrey Buchhalter Department of Pediatrics, University of Calgary School of Medicine, Calgary, Canada.
Christopher Michael McGraw Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
M Brandon Westover Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States; Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States; McCance Center for Brain Health, MGH, Boston, MA, United States.
Lidia M V R Moura Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States; Harvard Medical School, Boston, MA, United States; The NeuroValue Lab, MGH, Boston, MA, United States.

Collapse

Crema C, Attardi G, Sartiano D, Redolfi A. Natural language processing in clinical neuroscience and psychiatry: A review. Front Psychiatry 2022;13:946387. [PMID: 36186874 PMCID: PMC9515453 DOI: 10.3389/fpsyt.2022.946387] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open

Xie K, Gallagher RS, Conrad EC, Garrick CO, Baldassano SN, Bernabei JM, Galer PD, Ghosn NJ, Greenblatt AS, Jennings T, Kornspun A, Kulick-Soper CV, Panchal JM, Pattnaik AR, Scheid BH, Wei D, Weitzman M, Muthukrishnan R, Kim J, Litt B, Ellis CA, Roth D. OUP accepted manuscript. J Am Med Inform Assoc 2022;29:873-881. [PMID: 35190834 PMCID: PMC9006692 DOI: 10.1093/jamia/ocac018] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 01/11/2022] [Accepted: 02/08/2022] [Indexed: 11/14/2022] Open

Affiliation(s)

Kevin Xie Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Ryan S Gallagher Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA Department of Neurology, Penn Epilepsy Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Erin C Conrad Department of Neurology, Penn Epilepsy Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Chadric O Garrick Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Steven N Baldassano Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
John M Bernabei Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Peter D Galer Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
Nina J Ghosn Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Adam S Greenblatt Department of Neurology, Penn Epilepsy Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Tara Jennings Department of Neurology, Penn Epilepsy Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Alana Kornspun Department of Neurology, Penn Epilepsy Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Catherine V Kulick-Soper Department of Neurology, Penn Epilepsy Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Jal M Panchal Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA The General Robotics, Automation, Sensing and Perception Laboratory, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Akash R Pattnaik Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Brittany H Scheid Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Danmeng Wei Department of Neurology, Penn Epilepsy Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Micah Weitzman Department of Electrical and Systems Engineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Ramya Muthukrishnan Department of Computer and Information Science, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Joongwon Kim Department of Computer and Information Science, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Brian Litt Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA Department of Neurology, Penn Epilepsy Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Colin A Ellis Corresponding Authors: Colin A. Ellis, MD, Department of Neurology, Penn Epilepsy Center, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA 19104, USA;
Dan Roth

Collapse

Buchlak QD, Esmaili N, Bennett C, Farrokhi F. Natural Language Processing Applications in the Clinical Neurosciences: A Machine Learning Augmented Systematic Review. ACTA NEUROCHIRURGICA. SUPPLEMENT 2022;134:277-289. [PMID: 34862552 DOI: 10.1007/978-3-030-85292-4_32] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

de Oliveira JM, da Costa CA, Antunes RS. Data structuring of electronic health records: a systematic review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00607-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

Weissler EH, Naumann T, Andersson T, Ranganath R, Elemento O, Luo Y, Freitag DF, Benoit J, Hughes MC, Khan F, Slater P, Shameer K, Roe M, Hutchison E, Kollins SH, Broedl U, Meng Z, Wong JL, Curtis L, Huang E, Ghassemi M. The role of machine learning in clinical research: transforming the future of evidence generation. Trials 2021;22:537. [PMID: 34399832 PMCID: PMC8365941 DOI: 10.1186/s13063-021-05489-x] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open

Affiliation(s)

E Hope Weissler Duke Clinical Research Institute, Duke University School of Medicine, Box 2834, Durham, NC, 27701, USA.
Tristan Naumann Microsoft Research, Cambridge, MA, USA
Tomas Andersson AstraZeneca, Gothenburg, Sweden
Rajesh Ranganath Courant Institute of Mathematical Science, New York University, New York, NY, USA
Olivier Elemento Englander Institute for Precision Medicine, Weill Cornell Medical College, New York, NY, USA
Yuan Luo Northwestern University Clinical and Translational Sciences Institute, Northwestern University, Chicago, IL, USA
Daniel F Freitag Division Pharmaceuticals, Open Innovation and Digital Technologies, Bayer AG, Wuppertal, Germany
James Benoit University of Alberta, Edmonton, Alberta, Canada
Michael C Hughes Department of Computer Science, Tufts University, Medford, MA, USA
Faisal Khan AstraZeneca, Gothenburg, Sweden
Paul Slater Billion Minds, Inc., Seattle, WA, USA
Khader Shameer AstraZeneca, Gothenburg, Sweden
Matthew Roe Verana Health, San Francisco, CA, USA
Emmette Hutchison AstraZeneca, Gothenburg, Sweden
Scott H Kollins Duke Clinical Research Institute, Duke University School of Medicine, Box 2834, Durham, NC, 27701, USA
Uli Broedl Boehringer-Ingelheim, Burlington, Canada
Zhaoling Meng Sanofi, Cambridge, MA, USA
Jennifer L Wong Sanofi, Washington, DC, USA
Lesley Curtis Duke Clinical Research Institute, Duke University School of Medicine, Box 2834, Durham, NC, 27701, USA
Erich Huang Duke Clinical Research Institute, Duke University School of Medicine, Box 2834, Durham, NC, 27701, USA.,Duke Forge, Durham, NC, USA
Marzyeh Ghassemi Vector Institute, University of Toronto, Toronto, Ontario, Canada.,Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, USA.,Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, USA.,CIFAR AI Chair, Vector Institute, Toronto, Ontario, Canada

Collapse

Dobbie S, Strafford H, Pickrell WO, Fonferko-Shadrach B, Jones C, Akbari A, Thompson S, Lacey A. Markup: A Web-Based Annotation Tool Powered by Active Learning. Front Digit Health 2021;3:598916. [PMID: 34713086 PMCID: PMC8521860 DOI: 10.3389/fdgth.2021.598916] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 06/16/2021] [Indexed: 11/13/2022] Open

Ford E, Curlewis K, Squires E, Griffiths LJ, Stewart R, Jones KH. The Potential of Research Drawing on Clinical Free Text to Bring Benefits to Patients in the United Kingdom: A Systematic Review of the Literature. Front Digit Health 2021;3:606599. [PMID: 34713089 PMCID: PMC8521813 DOI: 10.3389/fdgth.2021.606599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 01/15/2021] [Indexed: 11/13/2022] Open

Abstract

Background: The analysis of clinical free text from patient records for research has potential to contribute to the medical evidence base but access to clinical free text is frequently denied by data custodians who perceive that the privacy risks of data-sharing are too high. Engagement activities with patients and regulators, where views on the sharing of clinical free text data for research have been discussed, have identified that stakeholders would like to understand the potential clinical benefits that could be achieved if access to free text for clinical research were improved. We aimed to systematically review all UK research studies which used clinical free text and report direct or potential benefits to patients, synthesizing possible benefits into an easy to communicate taxonomy for public engagement and policy discussions. Methods: We conducted a systematic search for articles which reported primary research using clinical free text, drawn from UK health record databases, which reported a benefit or potential benefit for patients, actionable in a clinical environment or health service, and not solely methods development or data quality improvement. We screened eligible papers and thematically analyzed information about clinical benefits reported in the paper to create a taxonomy of benefits. Results: We identified 43 papers and derived five themes of benefits: health-care quality or services improvement, observational risk factor-outcome research, drug prescribing safety, case-finding for clinical trials, and development of clinical decision support. Five papers compared study quality with and without free text and found an improvement of accuracy when free text was included in analytical models. Conclusions: Findings will help stakeholders weigh the potential benefits of free text research against perceived risks to patient privacy. The taxonomy can be used to aid public and policy discussions, and identified studies could form a public-facing repository which will help the health-care text analysis research community better communicate the impact of their work.

Collapse

Decker BM, Hill CE, Baldassano SN, Khankhanian P. Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches. Seizure 2021;85:138-144. [PMID: 33461032 DOI: 10.1016/j.seizure.2020.11.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/16/2020] [Accepted: 11/17/2020] [Indexed: 12/16/2022] Open

Identification of seizure clusters using free text notes in an electronic seizure diary. Epilepsy Behav 2020;113:107498. [PMID: 33096508 DOI: 10.1016/j.yebeh.2020.107498] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 09/09/2020] [Accepted: 09/12/2020] [Indexed: 11/21/2022]

Abstract

SIGNIFICANCE

Online seizure diaries offer a wealth of information regarding real world experience of patients living with epilepsy. Free text notes (FTN) written by patients reflect concerns and priorities of patients and provide supplemental information to structured diary data.

OBJECTIVE

This project evaluated feasibility using an automated lexical analysis to identify FTN relevant to seizure clusters (SCs).

METHODS

Data were extracted from EpiDiary™, a free electronic epilepsy diary with 42,799 unique users, generating 1,096,168 entries and 247,232 FTN. Both structured data as well as FTN were analyzed for presence of SC. A pilot study was conducted to validate an automated lexical analysis algorithm to identify SC in FTN in a sample of 98 diaries. The lexical analysis was then applied to the entire dataset. Outcomes included cluster prevalence and frequency, as well as the types of triggers commonly reported.

RESULTS

At least one FTN was found among 13,987 (32.68%) individual diaries. An automated lexical analysis algorithm identified 5797 of FTN as SC. There were 2423 unique patients with SC that were not identified by structured data alone and were identified using lexical analysis of FTN only. Seizure clusters were identified in n = 10,331 (24.1%) of diary users through both structured data and FTN. The median number of SCs days per year was 13.7, (interquartile rank (IQR): 3.2-54.7). The median number of seizures in a cluster day was 3 (IQR 2-4). The most common missed medication linked to patients with SC was levetiracetam (n = 576, 29%) followed by lamotrigine (n = 495, 24%), topiramate (n = 208, 10.5%), carbamazepine (n = 190, 9.6%), and lacosamide (n = 170, 8.6%). These percentages generally reflected prevalence of medication use in this population. The use of rescue medications was documented in 3306 of structured entries and 4305 in FTN.

CONCLUSION

This exploratory study demonstrates a novel approach applying lexical analysis to previously untapped FTN in a large electronic seizure diary database. Free text notes captured information about SC not available from the structured diary data. Diary FTN contain information of high importance to people with epilepsy, written in their own words.

Collapse

Weissler EH, Zhang J, Lippmann S, Rusincovitch S, Henao R, Jones WS. Use of Natural Language Processing to Improve Identification of Patients With Peripheral Artery Disease. Circ Cardiovasc Interv 2020;13:e009447. [PMID: 33040585 DOI: 10.1161/circinterventions.120.009447] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Abstract

BACKGROUND

Peripheral artery disease (PAD) is underrecognized, undertreated, and understudied: each of these endeavors requires efficient and accurate identification of patients with PAD. Currently, PAD patient identification relies on diagnosis/procedure codes or lists of patients diagnosed or treated by specific providers in specific locations and ways. The goal of this research was to leverage natural language processing to more accurately identify patients with PAD in an electronic health record system compared with a structured data-based approach.

METHODS

The clinical notes from a cohort of 6861 patients in our health system whose PAD status had previously been adjudicated were used to train, test, and validate a natural language processing model using 10-fold cross-validation. The performance of this model was described using the area under the receiver operating characteristic and average precision curves; its performance was quantitatively compared with an administrative data-based least absolute shrinkage and selection operator (LASSO) approach using the DeLong test.

RESULTS

The median (SD) of the area under the receiver operating characteristic curve for the natural language processing model was 0.888 (0.009) versus 0.801 (0.017) for the LASSO-based approach alone (DeLong P<0.0001). The median (SD) of the area under the precision curve was 0.909 (0.008) versus 0.816 (0.012) for the structured data-based approach. When sensitivity was set at 90%, the precision for LASSO was 65% and the machine learning approach was 74%, while the specificity for LASSO was 41% and for the machine learning approach was 62%.

CONCLUSIONS

Using a natural language processing approach in addition to partial cohort preprocessing with a LASSO-based model, we were able to meaningfully improve our ability to identify patients with PAD compared with an approach using structured data alone. This model has potential applications to both interventions targeted at improving patient care as well as efficient, large-scale PAD research. Graphic Abstract: A graphic abstract is available for this article.

Collapse

Jones KH, Ford EM, Lea N, Griffiths LJ, Hassan L, Heys S, Squires E, Nenadic G. Toward the Development of Data Governance Standards for Using Clinical Free-Text Data in Health Research: Position Paper. J Med Internet Res 2020;22:e16760. [PMID: 32597785 PMCID: PMC7367542 DOI: 10.2196/16760] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 03/06/2020] [Accepted: 03/23/2020] [Indexed: 01/17/2023] Open

Abstract

BACKGROUND

Clinical free-text data (eg, outpatient letters or nursing notes) represent a vast, untapped source of rich information that, if more accessible for research, would clarify and supplement information coded in structured data fields. Data usually need to be deidentified or anonymized before they can be reused for research, but there is a lack of established guidelines to govern effective deidentification and use of free-text information and avoid damaging data utility as a by-product.

OBJECTIVE

This study aimed to develop recommendations for the creation of data governance standards to integrate with existing frameworks for personal data use, to enable free-text data to be used safely for research for patient and public benefit.

METHODS

We outlined data protection legislation and regulations relating to the United Kingdom for context and conducted a rapid literature review and UK-based case studies to explore data governance models used in working with free-text data. We also engaged with stakeholders, including text-mining researchers and the general public, to explore perceived barriers and solutions in working with clinical free-text.

RESULTS

We proposed a set of recommendations, including the need for authoritative guidance on data governance for the reuse of free-text data, to ensure public transparency in data flows and uses, to treat deidentified free-text data as potentially identifiable with use limited to accredited data safe havens, and to commit to a culture of continuous improvement to understand the relationships between the efficacy of deidentification and reidentification risks, so this can be communicated to all stakeholders.

CONCLUSIONS

By drawing together the findings of a combination of activities, we present a position paper to contribute to the development of data governance standards for the reuse of clinical free-text data for secondary purposes. While working in accordance with existing data governance frameworks, there is a need for further work to take forward the recommendations we have proposed, with commitment and investment, to assure and expand the safe reuse of clinical free-text data for public benefit.

Collapse

Chiang KL, Huang CY, Hsieh LP, Chang KP. A propositional AI system for supporting epilepsy diagnosis based on the 2017 epilepsy classification: Illustrated by Dravet syndrome. Epilepsy Behav 2020;106:107021. [PMID: 32224446 DOI: 10.1016/j.yebeh.2020.107021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 03/02/2020] [Accepted: 03/02/2020] [Indexed: 01/01/2023]

Hwang JE, Seoung BO, Lee SO, Shin SY. Implementing Structured Clinical Templates at a Single Tertiary Hospital: Survey Study. JMIR Med Inform 2020;8:e13836. [PMID: 32352392 PMCID: PMC7226057 DOI: 10.2196/13836] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 11/26/2019] [Accepted: 02/26/2020] [Indexed: 02/06/2023] Open

Abstract

Background

Electronic health record (EHR) systems have been widely adopted in hospitals. However, since current EHRs mainly focus on lowering the number of paper documents used, they have suffered from poor search function and reusability capabilities. To overcome these drawbacks, structured clinical templates have been proposed; however, they are not widely used owing to the inconvenience of data entry.

Objective

This study aims to verify the usability of structured templates by comparing data entry times.

Methods

A Korean tertiary hospital has implemented structured clinical templates with the modeling of clinical contents for the last 6 years. As a result, 1238 clinical content models (ie, body measurements, vital signs, and allergies) have been developed and 492 models for 13 clinical templates, including pathology reports, were applied to EHRs for clinical practice. Then, to verify the usability of the structured templates, data entry times from free-texts and four structured pathology report templates were compared using 4391 entries from structured data entry (SDE) log data and 4265 entries from free-text log data. In addition, a paper-based survey and a focus group interview were conducted with 23 participants from three different groups, including EHR developers, pathology transcriptionists, and clinical data extraction team members.

Results

Based on the analysis of time required for data entry, in most cases, beginner users of the structured clinical templates required at most 70.18% more time for data entry. However, as users became accustomed to the templates, they were able to enter data more quickly than via free-text entry: at least 1 minute and 23 seconds (16.8%) up to 5 minutes and 42 seconds (27.6%). Interestingly, well-designed thyroid cancer pathology reports required 14.54% less data entry time from the beginning of the SDE implementation. In the interviews and survey, we confirmed that most of the interviewees agreed on the need for structured templates. However, they were skeptical about structuring all the items included in the templates.

Conclusions

The increase in initial elapsed time led users to hold a negative opinion of SDE, despite its benefits. To overcome these obstacles, it is necessary to structure the clinical templates for optimum use. In addition, user experience in terms of ease of data entry must be considered as an essential aspect in the development of structured clinical templates.

Collapse

Baldassano SN, Hill CE, Shankar A, Bernabei J, Khankhanian P, Litt B. Big data in status epilepticus. Epilepsy Behav 2019;101:106457. [PMID: 31444029 PMCID: PMC6944751 DOI: 10.1016/j.yebeh.2019.106457] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Accepted: 07/26/2019] [Indexed: 12/23/2022]