1
|
van Diessen E, van Amerongen RA, Zijlmans M, Otte WM. Potential merits and flaws of large language models in epilepsy care: A critical review. Epilepsia 2024; 65:873-886. [PMID: 38305763 DOI: 10.1111/epi.17907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/30/2023] [Accepted: 01/19/2024] [Indexed: 02/03/2024]
Abstract
The current pace of development and applications of large language models (LLMs) is unprecedented and will impact future medical care significantly. In this critical review, we provide the background to better understand these novel artificial intelligence (AI) models and how LLMs can be of future use in the daily care of people with epilepsy. Considering the importance of clinical history taking in diagnosing and monitoring epilepsy-combined with the established use of electronic health records-a great potential exists to integrate LLMs in epilepsy care. We present the current available LLM studies in epilepsy. Furthermore, we highlight and compare the most commonly used LLMs and elaborate on how these models can be applied in epilepsy. We further discuss important drawbacks and risks of LLMs, and we provide recommendations for overcoming these limitations.
Collapse
Affiliation(s)
- Eric van Diessen
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
- Department of Pediatrics, Franciscus Gasthuis & Vlietland, Rotterdam, The Netherlands
| | - Ramon A van Amerongen
- Faculty of Science, Bioinformatics and Biocomplexity, Utrecht University, Utrecht, The Netherlands
| | - Maeike Zijlmans
- Department of Neurology and Neurosurgery, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
- Stichting Epilepsie Instellingen Nederland, Heemstede, The Netherlands
| | - Willem M Otte
- Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
2
|
Xu X, Li J, Zhu Z, Zhao L, Wang H, Song C, Chen Y, Zhao Q, Yang J, Pei Y. A Comprehensive Review on Synergy of Multi-Modal Data and AI Technologies in Medical Diagnosis. Bioengineering (Basel) 2024; 11:219. [PMID: 38534493 DOI: 10.3390/bioengineering11030219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/28/2024] Open
Abstract
Disease diagnosis represents a critical and arduous endeavor within the medical field. Artificial intelligence (AI) techniques, spanning from machine learning and deep learning to large model paradigms, stand poised to significantly augment physicians in rendering more evidence-based decisions, thus presenting a pioneering solution for clinical practice. Traditionally, the amalgamation of diverse medical data modalities (e.g., image, text, speech, genetic data, physiological signals) is imperative to facilitate a comprehensive disease analysis, a topic of burgeoning interest among both researchers and clinicians in recent times. Hence, there exists a pressing need to synthesize the latest strides in multi-modal data and AI technologies in the realm of medical diagnosis. In this paper, we narrow our focus to five specific disorders (Alzheimer's disease, breast cancer, depression, heart disease, epilepsy), elucidating advanced endeavors in their diagnosis and treatment through the lens of artificial intelligence. Our survey not only delineates detailed diagnostic methodologies across varying modalities but also underscores commonly utilized public datasets, the intricacies of feature engineering, prevalent classification models, and envisaged challenges for future endeavors. In essence, our research endeavors to contribute to the advancement of diagnostic methodologies, furnishing invaluable insights for clinical decision making.
Collapse
Affiliation(s)
- Xi Xu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Zhichao Zhu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Linna Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Huina Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Changwei Song
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Yining Chen
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Qing Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Jijiang Yang
- Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Yan Pei
- School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Japan
| |
Collapse
|
3
|
Mora S, Turrisi R, Chiarella L, Consales A, Tassi L, Mai R, Nobili L, Barla A, Arnulfo G. NLP-based tools for localization of the epileptogenic zone in patients with drug-resistant focal epilepsy. Sci Rep 2024; 14:2349. [PMID: 38287042 PMCID: PMC10825198 DOI: 10.1038/s41598-024-51846-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 01/10/2024] [Indexed: 01/31/2024] Open
Abstract
Epilepsy surgery is an option for people with focal onset drug-resistant (DR) seizures but a delayed or incorrect diagnosis of epileptogenic zone (EZ) location limits its efficacy. Seizure semiological manifestations and their chronological appearance contain valuable information on the putative EZ location but their interpretation relies on extensive experience. The aim of our work is to support the localization of EZ in DR patients automatically analyzing the semiological description of seizures contained in video-EEG reports. Our sample is composed of 536 descriptions of seizures extracted from Electronic Medical Records of 122 patients. We devised numerical representations of anamnestic records and seizures descriptions, exploiting Natural Language Processing (NLP) techniques, and used them to feed Machine Learning (ML) models. We performed three binary classification tasks: localizing the EZ in the right or left hemisphere, temporal or extra-temporal, and frontal or posterior regions. Our computational pipeline reached performances above 70% in all tasks. These results show that NLP-based numerical representation combined with ML-based classification models may help in localizing the origin of the seizures relying only on seizures-related semiological text data alone. Accurate early recognition of EZ could enable a more appropriate patient management and a faster access to epilepsy surgery to potential candidates.
Collapse
Affiliation(s)
- Sara Mora
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16145, Genoa, Italy.
| | - Rosanna Turrisi
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16145, Genoa, Italy
- MaLGa Machine Learning Genoa Center, University of Genoa, 16146, Genoa, Italy
| | - Lorenzo Chiarella
- Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics, Child and Maternal Health (DINOGMI), University of Genoa, 16132, Genoa, Italy
- Child Neuropsychiatry Unit, IRCCS Istituto Giannina Gaslini, Member of the European Reference Network EpiCARE, 16147, Genoa, Italy
| | - Alessandro Consales
- Division of Neurosurgery, IRCCS Istituto Giannina Gaslini, 16147, Genoa, Italy
| | - Laura Tassi
- "Claudio Munari" Epilepsy Surgery Center, Niguarda Hospital, 20162, Milan, Italy
| | - Roberto Mai
- "Claudio Munari" Epilepsy Surgery Center, Niguarda Hospital, 20162, Milan, Italy
| | - Lino Nobili
- Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics, Child and Maternal Health (DINOGMI), University of Genoa, 16132, Genoa, Italy
- Child Neuropsychiatry Unit, IRCCS Istituto Giannina Gaslini, Member of the European Reference Network EpiCARE, 16147, Genoa, Italy
| | - Annalisa Barla
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16145, Genoa, Italy
- MaLGa Machine Learning Genoa Center, University of Genoa, 16146, Genoa, Italy
| | - Gabriele Arnulfo
- Department of Informatics, Bioengineering, Robotics and System Engineering (DIBRIS), University of Genoa, 16145, Genoa, Italy
- Neuroscience Center, Helsinki Institute of Life Science (HiLife), University of Helsinki, 00014, Helsinki, Finland
| |
Collapse
|
4
|
Vulpius SA, Werge S, Jørgensen IF, Siggaard T, Hernansanz Biel J, Knudsen GM, Brunak S, Pinborg LH. Text mining of electronic health records can validate a register-based diagnosis of epilepsy and subgroup into focal and generalized epilepsy. Epilepsia 2023; 64:2750-2760. [PMID: 37548470 DOI: 10.1111/epi.17734] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 08/01/2023] [Accepted: 08/01/2023] [Indexed: 08/08/2023]
Abstract
OBJECTIVE Combining population-based health registries and electronic health records offers the opportunity to create large, phenotypically detailed patient cohorts of high quality. In this study, we used text mining of clinical notes to confirm International Classification of Diseases, 10th Revision (ICD-10)-registered epilepsy diagnoses and classify patients according to focal and generalized epilepsy types. METHODS Using the Danish National Patient Registry, we identified patients who between 2006 and 2016 received an ICD-10 diagnosis of epilepsy. To validate the epilepsy diagnosis and stratify patients into focal and generalized epilepsy types, we constructed dictionaries for text mining-based extraction of clinical notes. Two physicians manually reviewed the clinical notes for a total of 527 patients and assigned epilepsy diagnoses, which were compared with the text-mined diagnoses. RESULTS We identified 23 632 patients with an ICD-10 diagnosis of epilepsy, of whom 50% were registered with an unspecified epilepsy diagnosis. In total, 11 211 patients were considered likely to have epilepsy by text mining, with an F1 measure ranging from 82% to 90%. Manual review of the electronic health records for 310 patients revealed a false discovery rate of 29%. This rate was decreased to 4% by the text mining algorithm. The weighted average F1 measure for text mining-assigned epilepsy types was 79% (82% for focal and 76% for generalized epilepsy). Text mining successfully assigned a focal or generalized epilepsy type to 92% of the text mining-eligible patients registered with unspecified epilepsy. SIGNIFICANCE Text mining of electronic health records can be used to establish a patient cohort with much higher likelihood of having a diagnosis of epilepsy and a focal or generalized epilepsy type compared to the cohort created from ICD-10 epilepsy codes alone. We believe the concept will be essential for future genome-wide and phenome-wide association studies and subsequently the development of precision medicine for epilepsy patients.
Collapse
Affiliation(s)
- Siri A Vulpius
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Sebastian Werge
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Isabella Friis Jørgensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Troels Siggaard
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Jorge Hernansanz Biel
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Gitte M Knudsen
- Epilepsy Clinic and Neurobiology Research Unit, University Hospital Rigshospitalet, Copenhagen, Denmark
- Institute for Clinical Medicine, Faculty of Health and Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Lars H Pinborg
- Epilepsy Clinic and Neurobiology Research Unit, University Hospital Rigshospitalet, Copenhagen, Denmark
- Institute for Clinical Medicine, Faculty of Health and Medicine, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
5
|
Wissel BD, Greiner HM, Glauser TA, Mangano FT, Holland-Bouley KD, Zhang N, Szczesniak RD, Santel D, Pestian JP, Dexheimer JW. Automated, machine learning-based alerts increase epilepsy surgery referrals: A randomized controlled trial. Epilepsia 2023; 64:1791-1799. [PMID: 37102995 PMCID: PMC10524622 DOI: 10.1111/epi.17629] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 04/25/2023] [Accepted: 04/25/2023] [Indexed: 04/28/2023]
Abstract
OBJECTIVE To determine whether automated, electronic alerts increased referrals for epilepsy surgery. METHODS We conducted a prospective, randomized controlled trial of a natural language processing-based clinical decision support system embedded in the electronic health record (EHR) at 14 pediatric neurology outpatient clinic sites. Children with epilepsy and at least two prior neurology visits were screened by the system prior to their scheduled visit. Patients classified as a potential surgical candidate were randomized 2:1 for their provider to receive an alert or standard of care (no alert). The primary outcome was referral for a neurosurgical evaluation. The likelihood of referral was estimated using a Cox proportional hazards regression model. RESULTS Between April 2017 and April 2019, at total of 4858 children were screened by the system, and 284 (5.8%) were identified as potential surgical candidates. Two hundred four patients received an alert, and 96 patients received standard care. Median follow-up time was 24 months (range: 12-36 months). Compared to the control group, patients whose provider received an alert were more likely to be referred for a presurgical evaluation (3.1% vs 9.8%; adjusted hazard ratio [HR] = 3.21, 95% confidence interval [CI]: 0.95-10.8; one-sided p = .03). Nine patients (4.4%) in the alert group underwent epilepsy surgery, compared to none (0%) in the control group (one-sided p = .03). SIGNIFICANCE Machine learning-based automated alerts may improve the utilization of referrals for epilepsy surgery evaluations.
Collapse
Affiliation(s)
- Benjamin D Wissel
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Hansel M Greiner
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Tracy A Glauser
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Francesco T Mangano
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Neurosurgery, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Katherine D Holland-Bouley
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Nanhua Zhang
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Biostatistics & Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Rhonda D Szczesniak
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Biostatistics & Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Daniel Santel
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - John P Pestian
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
| | - Judith W Dexheimer
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
- Division of Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| |
Collapse
|
6
|
Fernandes M, Cardall A, Jing J, Ge W, Moura LMVR, Jacobs C, McGraw C, Zafar SF, Westover MB. Identification of patients with epilepsy using automated electronic health records phenotyping. Epilepsia 2023; 64:1472-1481. [PMID: 36934317 PMCID: PMC10239346 DOI: 10.1111/epi.17589] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 03/20/2023]
Abstract
OBJECTIVE Unstructured data present in electronic health records (EHR) are a rich source of medical information; however, their abstraction is labor intensive. Automated EHR phenotyping (AEP) can reduce the need for manual chart review. We present an AEP model that is designed to automatically identify patients diagnosed with epilepsy. METHODS The ground truth for model training and evaluation was captured from a combination of structured questionnaires filled out by physicians for a subset of patients and manual chart review using customized software. Modeling features included indicators of the presence of keywords and phrases in unstructured clinical notes, prescriptions for antiseizure medications (ASMs), International Classification of Diseases (ICD) codes for seizures and epilepsy, number of ASMs and epilepsy-related ICD codes, age, and sex. Data were randomly divided into training (70%) and hold-out testing (30%) sets, with distinct patients in each set. We trained regularized logistic regression and an extreme gradient boosting models. Model performance was measured using area under the receiver operating curve (AUROC) and area under the precision-recall curve (AUPRC), with 95% confidence intervals (CI) estimated via bootstrapping. RESULTS Our study cohort included 3903 adults drawn from outpatient departments of nine hospitals between February 2015 and June 2022 (mean age = 47 ± 18 years, 57% women, 82% White, 84% non-Hispanic, 70% with epilepsy). The final models included 285 features, including 246 keywords and phrases captured from 8415 encounters. Both models achieved AUROC and AUPRC of 1 (95% CI = .99-1.00) in the hold-out testing set. SIGNIFICANCE A machine learning-based AEP approach accurately identifies patients with epilepsy from notes, ICD codes, and ASMs. This model can enable large-scale epilepsy research using EHR databases.
Collapse
Affiliation(s)
- Marta Fernandes
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- Clinical Data Animation Center, Massachusetts General Hospital, Boston, Massachusetts, USA
- Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Aidan Cardall
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- Clinical Data Animation Center, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Jin Jing
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- Clinical Data Animation Center, Massachusetts General Hospital, Boston, Massachusetts, USA
- Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Wendong Ge
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- Clinical Data Animation Center, Massachusetts General Hospital, Boston, Massachusetts, USA
- Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Lidia M. V. R. Moura
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Claire Jacobs
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Christopher McGraw
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Sahar F. Zafar
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - M. Brandon Westover
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- Clinical Data Animation Center, Massachusetts General Hospital, Boston, Massachusetts, USA
- Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, Massachusetts, USA
| |
Collapse
|
7
|
Yew ANJ, Schraagen M, Otte WM, van Diessen E. Transforming epilepsy research: A systematic review on natural language processing applications. Epilepsia 2023; 64:292-305. [PMID: 36462150 PMCID: PMC10108221 DOI: 10.1111/epi.17474] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 11/23/2022] [Accepted: 12/01/2022] [Indexed: 12/05/2022]
Abstract
Despite improved ancillary investigations in epilepsy care, patients' narratives remain indispensable for diagnosing and treatment monitoring. This wealth of information is typically stored in electronic health records and accumulated in medical journals in an unstructured manner, thereby restricting complete utilization in clinical decision-making. To this end, clinical researchers increasing apply natural language processing (NLP)-a branch of artificial intelligence-as it removes ambiguity, derives context, and imbues standardized meaning from free-narrative clinical texts. This systematic review presents an overview of the current NLP applications in epilepsy and discusses the opportunities and drawbacks of NLP alongside its future implications. We searched the PubMed and Embase databases with a "natural language processing" and "epilepsy" query (March 4, 2022) and included original research articles describing the application of NLP techniques for textual analysis in epilepsy. Twenty-six studies were included. Fifty-eight percent of these studies used NLP to classify clinical records into predefined categories, improving patient identification and treatment decisions. Other applications of NLP had structured clinical information retrieval from electronic health records, scientific papers, and online posts of patients. Challenges and opportunities of NLP applications for enhancing epilepsy care and research are discussed. The field could further benefit from NLP by replicating successes in other health care domains, such as NLP-aided quality evaluation for clinical decision-making, outcome prediction, and clinical record summarization.
Collapse
Affiliation(s)
- Arister N J Yew
- University College Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Marijn Schraagen
- Department of Information and Computing Sciences, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Willem M Otte
- Department of Child Neurology, Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| | - Eric van Diessen
- Department of Child Neurology, Brain Center, University Medical Center Utrecht and Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
8
|
Crema C, Attardi G, Sartiano D, Redolfi A. Natural language processing in clinical neuroscience and psychiatry: A review. Front Psychiatry 2022; 13:946387. [PMID: 36186874 PMCID: PMC9515453 DOI: 10.3389/fpsyt.2022.946387] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Natural language processing (NLP) is rapidly becoming an important topic in the medical community. The ability to automatically analyze any type of medical document could be the key factor to fully exploit the data it contains. Cutting-edge artificial intelligence (AI) architectures, particularly machine learning and deep learning, have begun to be applied to this topic and have yielded promising results. We conducted a literature search for 1,024 papers that used NLP technology in neuroscience and psychiatry from 2010 to early 2022. After a selection process, 115 papers were evaluated. Each publication was classified into one of three categories: information extraction, classification, and data inference. Automated understanding of clinical reports in electronic health records has the potential to improve healthcare delivery. Overall, the performance of NLP applications is high, with an average F1-score and AUC above 85%. We also derived a composite measure in the form of Z-scores to better compare the performance of NLP models and their different classes as a whole. No statistical differences were found in the unbiased comparison. Strong asymmetry between English and non-English models, difficulty in obtaining high-quality annotated data, and train biases causing low generalizability are the main limitations. This review suggests that NLP could be an effective tool to help clinicians gain insights from medical reports, clinical research forms, and more, making NLP an effective tool to improve the quality of healthcare services.
Collapse
Affiliation(s)
- Claudio Crema
- Laboratory of Neuroinformatics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | | | - Daniele Sartiano
- Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Pisa, Italy
| | - Alberto Redolfi
- Laboratory of Neuroinformatics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| |
Collapse
|
9
|
Buchlak QD, Esmaili N, Bennett C, Farrokhi F. Natural Language Processing Applications in the Clinical Neurosciences: A Machine Learning Augmented Systematic Review. ACTA NEUROCHIRURGICA. SUPPLEMENT 2022; 134:277-289. [PMID: 34862552 DOI: 10.1007/978-3-030-85292-4_32] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Natural language processing (NLP), a domain of artificial intelligence (AI) that models human language, has been used in medicine to automate diagnostics, detect adverse events, support decision making and predict clinical outcomes. However, applications to the clinical neurosciences appear to be limited. NLP has matured with the implementation of deep transformer models (e.g., XLNet, BERT, T5, and RoBERTa) and transfer learning. The objectives of this study were to (1) systematically review NLP applications in the clinical neurosciences, and (2) explore NLP analysis to facilitate literature synthesis, providing clear examples to demonstrate the potential capabilities of these technologies for a clinical audience. Our NLP analysis consisted of keyword identification, text summarization and document classification. A total of 48 articles met inclusion criteria. NLP has been applied in the clinical neurosciences to facilitate literature synthesis, data extraction, patient identification, automated clinical reporting and outcome prediction. The number of publications applying NLP has increased rapidly over the past five years. Document classifiers trained to differentiate included and excluded articles demonstrated moderate performance (XLNet AUC = 0.66, BERT AUC = 0.59, RoBERTa AUC = 0.62). The T5 transformer model generated acceptable abstract summaries. The application of NLP has the potential to enhance research and practice in the clinical neurosciences.
Collapse
Affiliation(s)
- Quinlan D Buchlak
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia.
| | - Nazanin Esmaili
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, Australia
| | - Christine Bennett
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia
| | - Farrokh Farrokhi
- Neuroscience Institute, Virginia Mason Medical Center, Seattle, WA, USA
| |
Collapse
|
10
|
Decker BM, Hill CE, Baldassano SN, Khankhanian P. Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches. Seizure 2021; 85:138-144. [PMID: 33461032 DOI: 10.1016/j.seizure.2020.11.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/16/2020] [Accepted: 11/17/2020] [Indexed: 12/16/2022] Open
Abstract
As automated data extraction and natural language processing (NLP) are rapidly evolving, improving healthcare delivery by harnessing large data is garnering great interest. Assessing antiepileptic drug (AED) efficacy and other epilepsy variables pertinent to healthcare delivery remain a critical barrier to improving patient care. In this systematic review, we examined automatic electronic health record (EHR) extraction methodologies pertinent to epilepsy. We also reviewed more generalizable NLP pipelines to extract other critical patient variables. Our review found varying reports of performance measures. Whereas automated data extraction pipelines are a crucial advancement, this review calls attention to standardizing NLP methodology and accuracy reporting for greater generalizability. Moreover, the use of crowdsourcing competitions to spur innovative NLP pipelines would further advance this field.
Collapse
Affiliation(s)
- Barbara M Decker
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States.
| | - Chloé E Hill
- Department of Neurology, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI, 48109, United States
| | - Steven N Baldassano
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| | - Pouya Khankhanian
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| |
Collapse
|
11
|
Wissel BD, Greiner HM, Glauser TA, Holland-Bouley KD, Mangano FT, Santel D, Faist R, Zhang N, Pestian JP, Szczesniak RD, Dexheimer JW. Prospective validation of a machine learning model that uses provider notes to identify candidates for resective epilepsy surgery. Epilepsia 2019; 61:39-48. [PMID: 31784992 DOI: 10.1111/epi.16398] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2019] [Revised: 11/05/2019] [Accepted: 11/05/2019] [Indexed: 12/23/2022]
Abstract
OBJECTIVE Delay to resective epilepsy surgery results in avoidable disease burden and increased risk of mortality. The objective was to prospectively validate a natural language processing (NLP) application that uses provider notes to assign epilepsy surgery candidacy scores. METHODS The application was trained on notes from (1) patients with a diagnosis of epilepsy and a history of resective epilepsy surgery and (2) patients who were seizure-free without surgery. The testing set included all patients with unknown surgical candidacy status and an upcoming neurology visit. Training and testing sets were updated weekly for 1 year. One- to three-word phrases contained in patients' notes were used as features. Patients prospectively identified by the application as candidates for surgery were manually reviewed by two epileptologists. Performance metrics were defined by comparing NLP-derived surgical candidacy scores with surgical candidacy status from expert chart review. RESULTS The training set was updated weekly and included notes from a mean of 519 ± 67 patients. The area under the receiver operating characteristic curve (AUC) from 10-fold cross-validation was 0.90 ± 0.04 (range = 0.83-0.96) and improved by 0.002 per week (P < .001) as new patients were added to the training set. Of the 6395 patients who visited the neurology clinic, 4211 (67%) were evaluated by the model. The prospective AUC on this test set was 0.79 (95% confidence interval [CI] = 0.62-0.96). Using the optimal surgical candidacy score threshold, sensitivity was 0.80 (95% CI = 0.29-0.99), specificity was 0.77 (95% CI = 0.64-0.88), positive predictive value was 0.25 (95% CI = 0.07-0.52), and negative predictive value was 0.98 (95% CI = 0.87-1.00). The number needed to screen was 5.6. SIGNIFICANCE An electronic health record-integrated NLP application can accurately assign surgical candidacy scores to patients in a clinical setting.
Collapse
Affiliation(s)
- Benjamin D Wissel
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Hansel M Greiner
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio.,Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Tracy A Glauser
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio.,Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Katherine D Holland-Bouley
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio.,Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Francesco T Mangano
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio.,Division of Neurosurgery, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Daniel Santel
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Robert Faist
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Nanhua Zhang
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio.,Division of Biostatistics & Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - John P Pestian
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio
| | - Rhonda D Szczesniak
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio.,Division of Biostatistics & Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| | - Judith W Dexheimer
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio.,Division of Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio
| |
Collapse
|
12
|
Abbasi B, Goldenholz DM. Machine learning applications in epilepsy. Epilepsia 2019; 60:2037-2047. [PMID: 31478577 PMCID: PMC9897263 DOI: 10.1111/epi.16333] [Citation(s) in RCA: 164] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 07/25/2019] [Accepted: 08/12/2019] [Indexed: 02/05/2023]
Abstract
Machine learning leverages statistical and computer science principles to develop algorithms capable of improving performance through interpretation of data rather than through explicit instructions. Alongside widespread use in image recognition, language processing, and data mining, machine learning techniques have received increasing attention in medical applications, ranging from automated imaging analysis to disease forecasting. This review examines the parallel progress made in epilepsy, highlighting applications in automated seizure detection from electroencephalography (EEG), video, and kinetic data, automated imaging analysis and pre-surgical planning, prediction of medication response, and prediction of medical and surgical outcomes using a wide variety of data sources. A brief overview of commonly used machine learning approaches, as well as challenges in further application of machine learning techniques in epilepsy, is also presented. With increasing computational capabilities, availability of effective machine learning algorithms, and accumulation of larger datasets, clinicians and researchers will increasingly benefit from familiarity with these techniques and the significant progress already made in their application in epilepsy.
Collapse
Affiliation(s)
- Bardia Abbasi
- Department of Neurology, Beth Israel Deaconess Medical Center, Boston, MA 02215
| | | |
Collapse
|
13
|
Barbour K, Hesdorffer DC, Tian N, Yozawitz EG, McGoldrick PE, Wolf S, McDonough TL, Nelson A, Loddenkemper T, Basma N, Johnson SB, Grinspan ZM. Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing. Epilepsia 2019; 60:1209-1220. [PMID: 31111463 DOI: 10.1111/epi.15966] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 04/25/2019] [Accepted: 04/25/2019] [Indexed: 11/27/2022]
Abstract
OBJECTIVE Sudden unexpected death in epilepsy (SUDEP) is an important cause of mortality in epilepsy. However, there is a gap in how often providers counsel patients about SUDEP. One potential solution is to electronically prompt clinicians to provide counseling via automated detection of risk factors in electronic medical records (EMRs). We evaluated (1) the feasibility and generalizability of using regular expressions to identify risk factors in EMRs and (2) barriers to generalizability. METHODS Data included physician notes for 3000 patients from one medical center (home) and 1000 from five additional centers (away). Through chart review, we identified three SUDEP risk factors: (1) generalized tonic-clonic seizures, (2) refractory epilepsy, and (3) epilepsy surgery candidacy. Regular expressions of risk factors were manually created with home training data, and performance was evaluated with home test and away test data. Performance was evaluated by sensitivity, positive predictive value, and F-measure. Generalizability was defined as an absolute decrease in performance by <0.10 for away versus home test data. To evaluate underlying barriers to generalizability, we identified causes of errors seen more often in away data than home data. To demonstrate how small revisions can improve generalizability, we removed three "boilerplate" standard text phrases from away notes and repeated performance. RESULTS We observed high performance in home test data (F-measure range = 0.86-0.90), and low to high performance in away test data (F-measure range = 0.53-0.81). After removing three boilerplate phrases, away performance improved (F-measure range = 0.79-0.89) and generalizability was achieved for nearly all measures. The only significant barrier to generalizability was use of boilerplate phrases, causing 104 of 171 errors (61%) in away data. SIGNIFICANCE Regular expressions are a feasible and probably a generalizable method to identify variables related to SUDEP risk. Our methods may be implemented to create large patient cohorts for research and to generate electronic prompts for SUDEP counseling.
Collapse
Affiliation(s)
- Kristen Barbour
- Division of Child Neurology, Weill Cornell Medicine, New York, New York
| | - Dale C Hesdorffer
- Department of Epidemiology, Columbia University Medical Center, New York, New York
| | - Niu Tian
- Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Elissa G Yozawitz
- Saul R. Korey Department of Neurology, Albert Einstein College of Medicine, Bronx, New York
| | | | - Steven Wolf
- Department of Neurology, Mount Sinai Health System, New York, New York
| | - Tiffani L McDonough
- Department of Epidemiology, Columbia University Medical Center, New York, New York
| | - Aaron Nelson
- Department of Neurology, New York University Langone Medical Center, New York, New York
| | | | - Natasha Basma
- Division of Child Neurology, Weill Cornell Medicine, New York, New York
| | - Stephen B Johnson
- Division of Child Neurology, Weill Cornell Medicine, New York, New York
| | | |
Collapse
|
14
|
Mediouni M, Schlatterer DR. Frailty as an Outcome Predictor After Ankle Fractures: Where Are We Now? Geriatr Orthop Surg Rehabil 2018; 9:2151459318801756. [PMID: 30479848 PMCID: PMC6240965 DOI: 10.1177/2151459318801756] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 08/23/2018] [Indexed: 01/15/2023] Open
|
15
|
Lanera C, Minto C, Sharma A, Gregori D, Berchialla P, Baldi I. Extending PubMed searches to ClinicalTrials.gov through a machine learning approach for systematic reviews. J Clin Epidemiol 2018; 103:22-30. [PMID: 29981872 DOI: 10.1016/j.jclinepi.2018.06.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 06/19/2018] [Accepted: 06/29/2018] [Indexed: 12/23/2022]
Abstract
OBJECTIVES Despite their essential role in collecting and organizing published medical literature, indexed search engines are unable to cover all relevant knowledge. Hence, current literature recommends the inclusion of clinical trial registries in systematic reviews (SRs). This study aims to provide an automated approach to extend a search on PubMed to the ClinicalTrials.gov database, relying on text mining and machine learning techniques. STUDY DESIGN AND SETTING The procedure starts from a literature search on PubMed. Next, it considers the training of a classifier that can identify documents with a comparable word characterization in the ClinicalTrials.gov clinical trial repository. Fourteen SRs, covering a broad range of health conditions, are used as case studies for external validation. A cross-validated support-vector machine (SVM) model was used as the classifier. RESULTS The sensitivity was 100% in all SRs except one (87.5%), and the specificity ranged from 97.2% to 99.9%. The ability of the instrument to distinguish on-topic from off-topic articles ranged from an area under the receiver operator characteristic curve of 93.4% to 99.9%. CONCLUSION The proposed machine learning instrument has the potential to help researchers identify relevant studies in the SR process by reducing workload, without losing sensitivity and at a small price in terms of specificity.
Collapse
Affiliation(s)
- Corrado Lanera
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic and Vascular Sciences, University of Padova, Via Loredan 18, Padova 35131, Italy
| | - Clara Minto
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic and Vascular Sciences, University of Padova, Via Loredan 18, Padova 35131, Italy
| | - Abhinav Sharma
- Department of Biological Sciences and Bioengineering (BSBE), IIT, Kanpur, India
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic and Vascular Sciences, University of Padova, Via Loredan 18, Padova 35131, Italy
| | - Paola Berchialla
- Department of Clinical and Biological Sciences, University of Torino, Via Santena 5bis, Torino 10126, Italy
| | - Ileana Baldi
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic and Vascular Sciences, University of Padova, Via Loredan 18, Padova 35131, Italy.
| |
Collapse
|
16
|
Can a collaborative healthcare network improve the care of people with epilepsy? Epilepsy Behav 2018; 82:189-193. [PMID: 29573986 DOI: 10.1016/j.yebeh.2018.02.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Accepted: 02/16/2018] [Indexed: 01/31/2023]
Abstract
New opportunities are now available to improve care in ways not possible previously. Information contained in electronic medical records can now be shared without identifying patients. With network collaboration, large numbers of medical records can be searched to identify patients most like the one whose complex medical situation challenges the physician. The clinical effectiveness of different treatment strategies can be assessed rapidly to help the clinician decide on the best treatment for this patient. Other capabilities from different components of the network can prompt the recognition of what is the best available option and encourage the sharing of information about programs and electronic tools. Difficulties related to privacy, harmonization, integration, and costs are expected, but these are currently being addressed successfully by groups of organizations led by those who recognize the benefits.
Collapse
|
17
|
Doan S, Maehara CK, Chaparro JD, Lu S, Liu R, Graham A, Berry E, Hsu CN, Kanegaye JT, Lloyd DD, Ohno-Machado L, Burns JC, Tremoulet AH. Building a Natural Language Processing Tool to Identify Patients With High Clinical Suspicion for Kawasaki Disease from Emergency Department Notes. Acad Emerg Med 2016; 23:628-36. [PMID: 26826020 DOI: 10.1111/acem.12925] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Revised: 11/29/2015] [Accepted: 12/30/2015] [Indexed: 11/26/2022]
Abstract
OBJECTIVE Delayed diagnosis of Kawasaki disease (KD) may lead to serious cardiac complications. We sought to create and test the performance of a natural language processing (NLP) tool, the KD-NLP, in the identification of emergency department (ED) patients for whom the diagnosis of KD should be considered. METHODS We developed an NLP tool that recognizes the KD diagnostic criteria based on standard clinical terms and medical word usage using 22 pediatric ED notes augmented by Unified Medical Language System vocabulary. With high suspicion for KD defined as fever and three or more KD clinical signs, KD-NLP was applied to 253 ED notes from children ultimately diagnosed with either KD or another febrile illness. We evaluated KD-NLP performance against ED notes manually reviewed by clinicians and compared the results to a simple keyword search. RESULTS KD-NLP identified high-suspicion patients with a sensitivity of 93.6% and specificity of 77.5% compared to notes manually reviewed by clinicians. The tool outperformed a simple keyword search (sensitivity = 41.0%; specificity = 76.3%). CONCLUSIONS KD-NLP showed comparable performance to clinician manual chart review for identification of pediatric ED patients with a high suspicion for KD. This tool could be incorporated into the ED electronic health record system to alert providers to consider the diagnosis of KD. KD-NLP could serve as a model for decision support for other conditions in the ED.
Collapse
Affiliation(s)
- Son Doan
- Department of Biomedical Informatics; University of California; San Diego CA
| | - Cleo K. Maehara
- Department of Biomedical Informatics; University of California; San Diego CA
| | - Juan D. Chaparro
- Department of Biomedical Informatics; University of California; San Diego CA
| | - Sisi Lu
- Department of Computer Science; University of Pittsburgh; Pittsburgh PA
| | - Ruiling Liu
- The University of Texas Health Science Center at Houston; Houston TX
| | | | - Erika Berry
- Department of Pediatrics; University of California at San Diego; La Jolla CA
| | - Chun-Nan Hsu
- Department of Biomedical Informatics; University of California; San Diego CA
| | - John T. Kanegaye
- Department of Pediatrics; University of California at San Diego; La Jolla CA
- Rady Children's Hospital San Diego; San Diego CA
| | - David D. Lloyd
- Children's Healthcare of Atlanta; Atlanta GA
- Emory University School of Medicine; Atlanta GA
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics; University of California; San Diego CA
| | - Jane C. Burns
- Department of Pediatrics; University of California at San Diego; La Jolla CA
- Rady Children's Hospital San Diego; San Diego CA
| | - Adriana H. Tremoulet
- Department of Pediatrics; University of California at San Diego; La Jolla CA
- Rady Children's Hospital San Diego; San Diego CA
| | | |
Collapse
|
18
|
Ni Y, Beck AF, Taylor R, Dyas J, Solti I, Grupp-Phelan J, Dexheimer JW. Will they participate? Predicting patients' response to clinical trial invitations in a pediatric emergency department. J Am Med Inform Assoc 2016; 23:671-80. [PMID: 27121609 PMCID: PMC4926740 DOI: 10.1093/jamia/ocv216] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 12/30/2015] [Indexed: 12/27/2022] Open
Abstract
Objective (1) To develop an automated algorithm to predict a patient’s response (ie, if the patient agrees or declines) before he/she is approached for a clinical trial invitation; (2) to assess the algorithm performance and the predictors on real-world patient recruitment data for a diverse set of clinical trials in a pediatric emergency department; and (3) to identify directions for future studies in predicting patients’ participation response. Materials and Methods We collected 3345 patients’ response to trial invitations on 18 clinical trials at one center that were actively enrolling patients between January 1, 2010 and December 31, 2012. In parallel, we retrospectively extracted demographic, socioeconomic, and clinical predictors from multiple sources to represent the patients’ profiles. Leveraging machine learning methodology, the automated algorithms predicted participation response for individual patients and identified influential features associated with their decision-making. The performance was validated on the collection of actual patient response, where precision, recall, F-measure, and area under the ROC curve were assessed. Results Compared to the random response predictor that simulated the current practice, the machine learning algorithms achieved significantly better performance (Precision/Recall/F-measure/area under the ROC curve: 70.82%/92.02%/80.04%/72.78% on 10-fold cross validation and 71.52%/92.68%/80.74%/75.74% on the test set). By analyzing the significant features output by the algorithms, the study confirmed several literature findings and identified challenges that could be mitigated to optimize recruitment. Conclusion By exploiting predictive variables from multiple sources, we demonstrated that machine learning algorithms have great potential in improving the effectiveness of the recruitment process by automatically predicting patients’ participation response to trial invitations.
Collapse
Affiliation(s)
- Yizhao Ni
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229-3039, USA
| | - Andrew F Beck
- Division of General and Community Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229-3039, USA
| | - Regina Taylor
- Division of Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229-3039, USA
| | - Jenna Dyas
- Division of Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229-3039, USA
| | - Imre Solti
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229-3039, USA
| | - Jacqueline Grupp-Phelan
- Division of Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229-3039, USA
| | - Judith W Dexheimer
- Department of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229-3039, USA Division of Emergency Medicine, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229-3039, USA
| |
Collapse
|
19
|
Temple MW, Lehmann CU, Fabbri D. Natural Language Processing for Cohort Discovery in a Discharge Prediction Model for the Neonatal ICU. Appl Clin Inform 2016; 7:101-15. [PMID: 27081410 DOI: 10.4338/aci-2015-09-ra-0114] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Accepted: 01/02/2016] [Indexed: 01/15/2023] Open
Abstract
OBJECTIVES Discharging patients from the Neonatal Intensive Care Unit (NICU) can be delayed for non-medical reasons including the procurement of home medical equipment, parental education, and the need for children's services. We previously created a model to identify patients that will be medically ready for discharge in the subsequent 2-10 days. In this study we use Natural Language Processing to improve upon that model and discern why the model performed poorly on certain patients. METHODS We retrospectively examined the text of the Assessment and Plan section from daily progress notes of 4,693 patients (103,206 patient-days) from the NICU of a large, academic children's hospital. A matrix was constructed using words from NICU notes (single words and bigrams) to train a supervised machine learning algorithm to determine the most important words differentiating poorly performing patients compared to well performing patients in our original discharge prediction model. RESULTS NLP using a bag of words (BOW) analysis revealed several cohorts that performed poorly in our original model. These included patients with surgical diagnoses, pulmonary hypertension, retinopathy of prematurity, and psychosocial issues. DISCUSSION The BOW approach aided in cohort discovery and will allow further refinement of our original discharge model prediction. Adequately identifying patients discharged home on g-tube feeds alone could improve the AUC of our original model by 0.02. Additionally, this approach identified social issues as a major cause for delayed discharge. CONCLUSION A BOW analysis provides a method to improve and refine our NICU discharge prediction model and could potentially avoid over 900 (0.9%) hospital days.
Collapse
Affiliation(s)
- Michael W Temple
- Department of Biomedical Informatics Vanderbilt University , Nashville, TN
| | - Christoph U Lehmann
- Department of Biomedical Informatics Vanderbilt University, Nashville, TN; Department of Pediatrics Vanderbilt University, Nashville, TN
| | - Daniel Fabbri
- Department of Biomedical Informatics Vanderbilt University , Nashville, TN
| |
Collapse
|
20
|
Han D, Wang S, Jiang C, Jiang X, Kim HE, Sun J, Ohno-Machado L. Trends in biomedical informatics: automated topic analysis of JAMIA articles. J Am Med Inform Assoc 2015; 22:1153-63. [PMID: 26555018 PMCID: PMC5009912 DOI: 10.1093/jamia/ocv157] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 09/08/2015] [Accepted: 09/14/2015] [Indexed: 01/26/2023] Open
Abstract
Biomedical Informatics is a growing interdisciplinary field in which research topics and citation trends have been evolving rapidly in recent years. To analyze these data in a fast, reproducible manner, automation of certain processes is needed. JAMIA is a "generalist" journal for biomedical informatics. Its articles reflect the wide range of topics in informatics. In this study, we retrieved Medical Subject Headings (MeSH) terms and citations of JAMIA articles published between 2009 and 2014. We use tensors (i.e., multidimensional arrays) to represent the interaction among topics, time and citations, and applied tensor decomposition to automate the analysis. The trends represented by tensors were then carefully interpreted and the results were compared with previous findings based on manual topic analysis. A list of most cited JAMIA articles, their topics, and publication trends over recent years is presented. The analyses confirmed previous studies and showed that, from 2012 to 2014, the number of articles related to MeSH terms Methods, Organization & Administration, and Algorithms increased significantly both in number of publications and citations. Citation trends varied widely by topic, with Natural Language Processing having a large number of citations in particular years, and Medical Record Systems, Computerized remaining a very popular topic in all years.
Collapse
Affiliation(s)
- Dong Han
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, USA
| | - Shuang Wang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Chao Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, USA
| | - Xiaoqian Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Hyeon-Eui Kim
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jimeng Sun
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, S30313, USA
| | - Lucila Ohno-Machado
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| |
Collapse
|