1
|
Sciannameo V, Pagliari DJ, Urru S, Grimaldi P, Ocagli H, Ahsani-Nasab S, Comoretto RI, Gregori D, Berchialla P. Information extraction from medical case reports using OpenAI InstructGPT. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 255:108326. [PMID: 39029416 DOI: 10.1016/j.cmpb.2024.108326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 06/15/2023] [Accepted: 07/11/2024] [Indexed: 07/21/2024]
Abstract
BACKGROUND AND OBJECTIVE Researchers commonly use automated solutions such as Natural Language Processing (NLP) systems to extract clinical information from large volumes of unstructured data. However, clinical text's poor semantic structure and domain-specific vocabulary can make it challenging to develop a one-size-fits-all solution. Large Language Models (LLMs), such as OpenAI's Generative Pre-Trained Transformer 3 (GPT-3), offer a promising solution for capturing and standardizing unstructured clinical information. This study evaluated the performance of InstructGPT, a family of models derived from LLM GPT-3, to extract relevant patient information from medical case reports and discussed the advantages and disadvantages of LLMs versus dedicated NLP methods. METHODS In this paper, 208 articles related to case reports of foreign body injuries in children were identified by searching PubMed, Scopus, and Web of Science. A reviewer manually extracted information on sex, age, the object that caused the injury, and the injured body part for each patient to build a gold standard to compare the performance of InstructGPT. RESULTS InstructGPT achieved high accuracy in classifying the sex, age, object and body part involved in the injury, with 94%, 82%, 94% and 89%, respectively. When excluding articles for which InstructGPT could not retrieve any information, the accuracy for determining the child's sex and age improved to 97%, and the accuracy for identifying the injured body part improved to 93%. InstructGPT was also able to extract information from non-English language articles. CONCLUSIONS The study highlights that LLMs have the potential to eliminate the necessity for task-specific training (zero-shot extraction), allowing the retrieval of clinical information from unstructured natural language text, particularly from published scientific literature like case reports, by directly utilizing the PDF file of the article without any pre-processing and without requiring any technical expertise in NLP or Machine Learning. The diverse nature of the corpus, which includes articles written in languages other than English, some of which contain a wide range of clinical details while others lack information, adds to the strength of the study.
Collapse
Affiliation(s)
- Veronica Sciannameo
- Centre for Biostatistics, Epidemiology and Public Health, Department of Clinical and Biological Sciences, University of Turin, Regione Gonzole 10, Orbassano 10043, Italy
| | | | - Sara Urru
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy
| | - Piercesare Grimaldi
- Department of Public Health and Pediatrics, University of Torino, Via Santena 5 bis, Torino 10126, Italy
| | - Honoria Ocagli
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy
| | - Sara Ahsani-Nasab
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy
| | - Rosanna Irene Comoretto
- Department of Public Health and Pediatrics, University of Torino, Via Santena 5 bis, Torino 10126, Italy
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, Padua, Italy
| | - Paola Berchialla
- Centre for Biostatistics, Epidemiology and Public Health, Department of Clinical and Biological Sciences, University of Turin, Regione Gonzole 10, Orbassano 10043, Italy.
| |
Collapse
|
2
|
Jafari E, Blackman MH, Karnes JH, Van Driest SL, Crawford DC, Choi L, McDonough CW. Using electronic health records for clinical pharmacology research: Challenges and considerations. Clin Transl Sci 2024; 17:e13871. [PMID: 38943244 PMCID: PMC11213823 DOI: 10.1111/cts.13871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/21/2024] [Accepted: 05/24/2024] [Indexed: 07/01/2024] Open
Abstract
Electronic health records (EHRs) contain a vast array of phenotypic data on large numbers of individuals, often collected over decades. Due to the wealth of information, EHR data have emerged as a powerful resource to make first discoveries and identify disparities in our healthcare system. While the number of EHR-based studies has exploded in recent years, most of these studies are directed at associations with disease rather than pharmacotherapeutic outcomes, such as drug response or adverse drug reactions. This is largely due to challenges specific to deriving drug-related phenotypes from the EHR. There is great potential for EHR-based discovery in clinical pharmacology research, and there is a critical need to address specific challenges related to accurate and reproducible derivation of drug-related phenotypes from the EHR. This review provides a detailed evaluation of challenges and considerations for deriving drug-related data from EHRs. We provide an examination of EHR-based computable phenotypes and discuss cutting-edge approaches to map medication information for clinical pharmacology research, including medication-based computable phenotypes and natural language processing. We also discuss additional considerations such as data structure, heterogeneity and missing data, rare phenotypes, and diversity within the EHR. By further understanding the complexities associated with conducting clinical pharmacology research using EHR-based data, investigators will be better equipped to design thoughtful studies with more reproducible results. Progress in utilizing EHRs for clinical pharmacology research should lead to significant advances in our ability to understand differential drug response and predict adverse drug reactions.
Collapse
Affiliation(s)
- Eissa Jafari
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, College of PharmacyUniversity of FloridaGainesvilleFloridaUSA
- Department of Pharmacy Practice, College of PharmacyJazan UniversityJazanSaudi Arabia
| | - Marisa H. Blackman
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Jason H. Karnes
- Department of Pharmacy Practice and ScienceUniversity of Arizona R. Ken Coit College of PharmacyTucsonArizonaUSA
| | - Sara L. Van Driest
- Department of PediatricsVanderbilt University Medical Center (VUMC)NashvilleTennesseeUSA
- Present address:
All of US Research Program, National Institutes of HealthBethesdaMarylandUSA
| | - Dana C. Crawford
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational BiologyCase Western Reserve UniversityClevelandOhioUSA
- Department of Genetics and Genome Sciences, Cleveland Institute for Computational BiologyCase Western Reserve UniversityClevelandOhioUSA
| | - Leena Choi
- Department of Biostatistics and Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Caitrin W. McDonough
- Department of Pharmacotherapy and Translational Research, Center for Pharmacogenomics and Precision Medicine, College of PharmacyUniversity of FloridaGainesvilleFloridaUSA
| |
Collapse
|
3
|
Bazoge A, Morin E, Daille B, Gourraud PA. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review. JMIR Med Inform 2023; 11:e42477. [PMID: 38100200 PMCID: PMC10757232 DOI: 10.2196/42477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 01/16/2023] [Accepted: 09/07/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible. OBJECTIVE The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks. METHODS This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English. RESULTS We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%). CONCLUSIONS CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.
Collapse
Affiliation(s)
- Adrien Bazoge
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
| | - Emmanuel Morin
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Béatrice Daille
- Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Pierre-Antoine Gourraud
- Nantes Université, CHU de Nantes, Pôle Hospitalo-Universitaire 11: Santé Publique, Clinique des données, INSERM, CIC 1413, F-44000 Nantes, France
- Nantes Université, INSERM, CHU de Nantes, École Centrale Nantes, Centre de Recherche Translationnelle en Transplantation et Immunologie, CR2TI, F-44000 Nantes, France
| |
Collapse
|
4
|
Elgarten CW, Thompson JC, Angiolillo A, Chen Z, Conway S, Devidas M, Gupta S, Kairalla JA, McNeer JL, O’Brien MM, Rabin KR, Rau RE, Rheingold SR, Wang C, Wood C, Raetz EA, Loh ML, Alexander S, Miller TP. Improving infectious adverse event reporting for children and adolescents enrolled in clinical trials for acute lymphoblastic leukemia: A report from the Children's Oncology Group. Pediatr Blood Cancer 2022; 69:e29937. [PMID: 36083863 PMCID: PMC9529813 DOI: 10.1002/pbc.29937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/28/2022] [Accepted: 07/30/2022] [Indexed: 11/08/2022]
Abstract
Infections cause substantial morbidity for children with acute lymphoblastic leukemia (ALL). Therefore, accurate characterization of infectious adverse events (AEs) reported on clinical trials is imperative to defining, comparing, and managing safety and toxicity. Here, we describe key processes implemented to improve reporting of infectious AEs on two active phase III Children's Oncology Group (COG) ALL trials. Processes include: (a) identifying infections as a targeted toxicity, (b) incorporation of infection-specific case report form questions, and (c) physician review of AEs with real-time data cleaning. Preliminary assessment of these processes suggests improved reporting, as well as opportunities for further improvement.
Collapse
Affiliation(s)
- Caitlin W. Elgarten
- Children’s Hospital of Philadelphia, Department of Pediatrics, Division of Oncology, Philadelphia, PA
| | - Joel C. Thompson
- Children’s Mercy Hospital, Department of Pediatrics, Division of Hematology/Oncology/Bone Marrow Transplant, University of Missouri-Kansas City, Kansas City, MO
| | - Anne Angiolillo
- Children’s National Medical Center, Center for Cancer and Blood Disorders, Washington DC
| | - Zhiguo Chen
- University of Florida, Department of Biostatistics, Gainesville, FL
| | - Susan Conway
- University of Florida, Department of Biostatistics, Gainesville, FL
| | | | - Sumit Gupta
- Department of Hematology/Oncology, Hospital for Sick Children, Toronto, ON
| | - John A. Kairalla
- University of Florida, Department of Biostatistics, Gainesville, FL
| | | | - Maureen M. O’Brien
- University of Cincinnati College of Medicine, Cincinnati Children’s Hospital Medical Center, Pediatric Hematology/Oncology, Cincinnati, OH
| | - Karen R. Rabin
- Baylor College of Medicine, Pediatric Hematology/Oncology, Houston, TX
| | - Rachel E. Rau
- Baylor College of Medicine, Pediatric Hematology/Oncology, Houston, TX
| | - Susan R. Rheingold
- Children’s Hospital of Philadelphia, Department of Pediatrics, Division of Oncology, Philadelphia, PA
| | - Cindy Wang
- University of Florida, Department of Biostatistics, Gainesville, FL
| | - Charlotte Wood
- University of Florida, Department of Biostatistics, Gainesville, FL
| | | | - Mignon L. Loh
- Division of Hematology, Oncology, Bone Marrow Transplant, and Cellular Therapies, Seattle Children’s Hospital and the Ben Towne Center for Childhood Cancer Research, University of Washington, Seattle, WA
| | - Sarah Alexander
- Department of Hematology/Oncology, Hospital for Sick Children, Toronto, ON
| | - Tamara P. Miller
- Children’s Healthcare of Atlanta – Egleston, Pediatric Hematology/Oncology, Atlanta, GA
| |
Collapse
|
5
|
Dong D, Wang Y, Wang C, Zong Y. Effects of transthoracic echocardiography on the prognosis of patients with acute respiratory distress syndrome: a propensity score matched analysis of the MIMIC-III database. BMC Pulm Med 2022; 22:247. [PMID: 35752780 PMCID: PMC9233371 DOI: 10.1186/s12890-022-02028-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 06/07/2022] [Indexed: 11/10/2022] Open
Abstract
Background Acute respiratory distress syndrome (ARDS) has high mortality and is mainly related to the circulatory failure.Therefore, real-time monitoring of cardiac function and structural changes has important clinical significance.Transthoracic echocardiography (TTE) is a simple and noninvasive real-time cardiac examination which is widely used in intensive care unit (ICU) patients.The purpose of this study was to analyze the effect of TTE on the prognosis of ICU patients with ARDS.
Methods The data of ARDS patients were retrieved from the MIMIC-III v1.4 database and patients were divided into the TTE group and non-TTE group. The baseline data were compared between the two groups. The effect of TTE on the prognosis of ARDS patients was analyzed through multivariate logistic analysis and the propensity score (PS). The primary outcome was the 28-d mortality rate. The secondary outcomes included pulmonary artery catheter (PAC) and Pulse index continuous cardiac output (PiCCO) administration, the ventilator-free and vasopressor-free days and total intravenous infusion volume on days 1, 2 and 3 of the mechanical ventilation. To illuminate the effect of echocardiography on the outcomes of ARDS patients,a sensitivity analysis was conducted by excluding those patients receiving either PiCCO or PAC. We also performed a subgroup analysis to assess the impact of TTE timing on the prognosis of patients with ARDS.
Results A total of 1,346 ARDS patients were enrolled, including 519 (38.6%) cases in the TTE group and 827 (61.4%) cases in the non-TTE group. In the multivariate logistic regression, the 28-day mortality of patients in the TTE group was greatly improved (OR 0.71, 95%CI 0.55–0.92, P = 0.008). More patients in the TTE group received PAC (2% vs. 10%, P < 0.001) and the length of ICU stay in the TTE group was significantly shorter than that in the non-TTE group (17d vs.14d, P = 0.0001). The infusion volume in the TTE group was significantly less than that of the non-TTE group (6.2L vs.5.5L on day 1, P = 0.0012). Importantly, the patients in the TTE group were weaned ventilators earlier than those in the non-TTE group (ventilator-free days within 28 d: 21 d vs. 19.8 d, respectively, P = 0.071). The Kaplan–Meier survival curves showed that TTE patients had significant lower 28-day mortality than non-TTE patients (log-rank = 0.004). Subgroup analysis showed that TTE after hemodynamic disorders can not improve prognosis (OR 1.02, 95%CI 0.79–1.34, P = 0.844).
Conclusion TTE was associated with improved 28-day outcomes in patients with ARDS.
Supplementary Information The online version contains supplementary material available at 10.1186/s12890-022-02028-5.
Collapse
Affiliation(s)
- Daoran Dong
- Department of ICU, Shaanxi Provincial People's Hospital, No. 256, Youyi West Road, Beilin District, Xi'an, Shaanxi, China
| | - Yan Wang
- Department of ICU, Shaanxi Provincial People's Hospital, No. 256, Youyi West Road, Beilin District, Xi'an, Shaanxi, China
| | - Chan Wang
- Department of ICU, Shaanxi Provincial People's Hospital, No. 256, Youyi West Road, Beilin District, Xi'an, Shaanxi, China
| | - Yuan Zong
- Department of ICU, Shaanxi Provincial People's Hospital, No. 256, Youyi West Road, Beilin District, Xi'an, Shaanxi, China.
| |
Collapse
|
6
|
Williams ML, Weeks HL, Beck C, Birdwell KA, Van Driest SL, Choi L. Sensitivity of estimated tacrolimus population pharmacokinetic profile to assumed dose timing and absorption in real-world data and simulated data. Br J Clin Pharmacol 2022; 88:2863-2874. [PMID: 34997625 PMCID: PMC9106813 DOI: 10.1111/bcp.15218] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 12/21/2021] [Accepted: 12/22/2021] [Indexed: 11/28/2022] Open
Abstract
AIMS Use of electronic health record (EHR) data to estimate population pharmacokinetic (PK) profiles necessitates several assumptions. We sought to investigate sensitivity to some of these assumptions about dose timing and absorption rates. METHODS A population PK study with 363 subjects was performed using real-world data extracted from EHRs to estimate the tacrolimus population PK profile. Data were extracted and built using our automated system, EHR2PKPD, suitable for quickly constructing large PK datasets from the EHR. Population PK studies for oral medications performed using EHR data often assume a regular dosing schedule as prescribed without incorporating exact dosing time. We assessed the sensitivity of the PK parameter estimates to assumptions about dose timing using last-dose times extracted by our own natural language processing system, medExtractR. We also investigated the sensitivity of estimates to absorption rate constants that are often fixed at a published value in tacrolimus population PK analyses. We conducted simulation studies to investigate how drug PK profiles and experimental designs such as concentration measurements design affect sensitivity to incorrect assumptions about dose timing and absorption rates. RESULTS There was no appreciable difference in parameter estimates with assumed versus extracted last-dose time, and our sensitivity analysis revealed little difference between parameters estimated across a range of assumed absorption rate constants. CONCLUSION Our findings suggest that drugs with a slower elimination rate (or a longer half-life) are less sensitive to dose timing errors and that experimental designs which only allow for trough blood concentrations are usually insensitive to deviation in absorption rate.
Collapse
Affiliation(s)
- Michael L. Williams
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN
| | - Hannah L. Weeks
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN
| | - Cole Beck
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN
| | - Kelly A. Birdwell
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Sara L. Van Driest
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN
| | - Leena Choi
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
7
|
Bejan CA, Cahill KN, Staso PJ, Choi L, Peterson JF, Phillips EJ. DrugWAS: Drug-wide Association Studies for COVID-19 Drug Repurposing. Clin Pharmacol Ther 2021; 110:1537-1546. [PMID: 34314511 PMCID: PMC8426999 DOI: 10.1002/cpt.2376] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 07/21/2021] [Indexed: 11/16/2022]
Abstract
This study aimed to systematically investigate if any of the available drugs in the electronic health record (EHR) can be repurposed as potential treatment for coronavirus disease 2019 (COVID-19). Based on a retrospective cohort analysis of EHR data, drug-wide association studies (DrugWAS) were performed on 9,748 patients with COVID-19 at Vanderbilt University Medical Center (VUMC). For each drug study, multivariable logistic regression with overlap weighting using propensity score was applied to estimate the effect of drug exposure on COVID-19 disease outcomes. Patient exposure to a drug between 3-months prior to the pandemic and the COVID-19 diagnosis was chosen as the exposure of interest. All-cause of death was selected as the primary outcome. Hospitalization, admission to the intensive care unit, and need for mechanical ventilation were identified as secondary outcomes. Overall, 17 drugs were significantly associated with decreased COVID-19 severity. Previous exposure to two types of 13-valent pneumococcal conjugate vaccines, PCV13 (odds ratio (OR), 0.31, 95% confidence interval (CI), 0.12-0.81 and OR, 0.33, 95% CI, 0.15-0.73), diphtheria toxoid and tetanus toxoid vaccine (OR, 0.38, 95% CI, 0.15-0.93) were significantly associated with a decreased risk of death (primary outcome). Secondary analyses identified several other significant associations showing lower risk for COVID-19 outcomes: acellular pertussis vaccine, 23-valent pneumococcal polysaccharide vaccine (PPSV23), flaxseed extract, ethinyl estradiol, estradiol, turmeric extract, ubidecarenone, azelastine, pseudoephedrine, dextromethorphan, omega-3 fatty acids, fluticasone, and ibuprofen. In conclusion, this cohort study leveraged EHR data to identify a list of drugs that could be repurposed to improve COVID-19 outcomes. Further randomized clinical trials are needed to investigate the efficacy of the proposed drugs.
Collapse
Affiliation(s)
- Cosmin A. Bejan
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Katherine N. Cahill
- Department of MedicineDivision of Allergy, Pulmonary and Critical Care MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Patrick J. Staso
- Department of MedicineDivision of Allergy, Pulmonary and Critical Care MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Leena Choi
- Department of BiostatisticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Josh F. Peterson
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
- Department of MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Elizabeth J. Phillips
- Department of Pathology, Microbiology and ImmunologyVanderbilt University Medical CenterNashvilleTennesseeUSA
- Department of MedicineDivision of Infectious DiseasesVanderbilt University Medical CenterNashvilleTennesseeUSA
- Department of PharmacologyVanderbilt University Medical CenterNashvilleTennesseeUSA
| |
Collapse
|
8
|
de Oliveira JM, da Costa CA, Antunes RS. Data structuring of electronic health records: a systematic review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00607-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
9
|
Almeida JR, Silva JF, Matos S, Oliveira JL. A two-stage workflow to extract and harmonize drug mentions from clinical notes into observational databases. J Biomed Inform 2021; 120:103849. [PMID: 34214696 DOI: 10.1016/j.jbi.2021.103849] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 06/04/2021] [Accepted: 06/19/2021] [Indexed: 01/02/2023]
Abstract
BACKGROUND The content of the clinical notes that have been continuously collected along patients' health history has the potential to provide relevant information about treatments and diseases, and to increase the value of structured data available in Electronic Health Records (EHR) databases. EHR databases are currently being used in observational studies which lead to important findings in medical and biomedical sciences. However, the information present in clinical notes is not being used in those studies, since the computational analysis of this unstructured data is much complex in comparison to structured data. METHODS We propose a two-stage workflow for solving an existing gap in Extraction, Transformation and Loading (ETL) procedures regarding observational databases. The first stage of the workflow extracts prescriptions present in patient's clinical notes, while the second stage harmonises the extracted information into their standard definition and stores the resulting information in a common database schema used in observational studies. RESULTS We validated this methodology using two distinct data sets, in which the goal was to extract and store drug related information in a new Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) database. We analysed the performance of the used annotator as well as its limitations. Finally, we described some practical examples of how users can explore these datasets once migrated to OMOP CDM databases. CONCLUSION With this methodology, we were able to show a strategy for using the information extracted from the clinical notes in business intelligence tools, or for other applications such as data exploration through the use of SQL queries. Besides, the extracted information complements the data present in OMOP CDM databases which was not directly available in the EHR database.
Collapse
Affiliation(s)
- João Rafael Almeida
- DETI/IEETA, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | | | - Sérgio Matos
- DETI/IEETA, University of Aveiro, Aveiro, Portugal.
| | | |
Collapse
|
10
|
Evolution of Hematology Clinical Trial Adverse Event Reporting to Improve Care Delivery. Curr Hematol Malig Rep 2021; 16:126-131. [PMID: 33786724 DOI: 10.1007/s11899-021-00627-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/21/2021] [Indexed: 10/21/2022]
Abstract
PURPOSE OF REVIEW Reporting of adverse events on hematology clinical trials is crucial to understanding the safety of standard treatments and novel agents. However, despite the importance of understanding toxicities, challenges in capturing and reporting accurate adverse event data exist. RECENT FINDINGS Currently, adverse events are reported manually on most hematology clinical trials. Especially on phase III trials, the highest grade of each adverse event during a reporting period is typically reported. Despite the effort committed to AE reporting, studies have identified underreporting of adverse events on hematologic malignancy clinical trials, which raises concern about the true understanding of safety of treatment that clinicians have in order to guide patients about what to expect during therapy. In order to address these concerns, recent studies have piloted alternative methods for identification of adverse events. These methods include automated extraction of adverse event data from the electronic health record, implementation of trigger or alert tools into the medical record, and analytic tools to evaluate duration of adverse events rather than only the highest adverse event grade. Adverse event reporting is a crucial component of clinical trials. Novel tools for identifying and reporting adverse events provide opportunities for honing and refining methods of toxicity capture and improving understanding of toxicities patients experience while enrolled on clinical trials.
Collapse
|
11
|
McNeer E, Beck C, Weeks HL, Williams ML, James NT, Bejan CA, Choi L. Building longitudinal medication dose data using medication information extracted from clinical notes in electronic health records. J Am Med Inform Assoc 2021; 28:782-790. [PMID: 33338223 DOI: 10.1093/jamia/ocaa291] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 12/08/2020] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE To develop an algorithm for building longitudinal medication dose datasets using information extracted from clinical notes in electronic health records (EHRs). MATERIALS AND METHODS We developed an algorithm that converts medication information extracted using natural language processing (NLP) into a usable format and builds longitudinal medication dose datasets. We evaluated the algorithm on 2 medications extracted from clinical notes of Vanderbilt's EHR and externally validated the algorithm using clinical notes from the MIMIC-III clinical care database. RESULTS For the evaluation using Vanderbilt's EHR data, the performance of our algorithm was excellent; F1-measures were ≥0.98 for both dose intake and daily dose. For the external validation using MIMIC-III, the algorithm achieved F1-measures ≥0.85 for dose intake and ≥0.82 for daily dose. DISCUSSION Our algorithm addresses the challenge of building longitudinal medication dose data using information extracted from clinical notes. Overall performance was excellent, but the algorithm can perform poorly when incorrect information is extracted by NLP systems. Although it performed reasonably well when applied to the external data source, its performance was worse due to differences in the way the drug information was written. The algorithm is implemented in the R package, "EHR," and the extracted data from Vanderbilt's EHRs along with the gold standards are provided so that users can reproduce the results and help improve the algorithm. CONCLUSION Our algorithm for building longitudinal dose data provides a straightforward way to use EHR data for medication-based studies. The external validation results suggest its potential for applicability to other systems.
Collapse
Affiliation(s)
- Elizabeth McNeer
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Cole Beck
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Hannah L Weeks
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Michael L Williams
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nathan T James
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Cosmin A Bejan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Leena Choi
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
12
|
Bejan CA, Cahill KN, Staso PJ, Choi L, Peterson JF, Phillips EJ. DrugWAS: Leveraging drug-wide association studies to facilitate drug repurposing for COVID-19. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.02.04.21251169. [PMID: 33564788 PMCID: PMC7872383 DOI: 10.1101/2021.02.04.21251169] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Importance: There is an unprecedented need to rapidly identify safe and effective treatments for the novel coronavirus disease 2019 (COVID-19). Objective: To systematically investigate if any of the available drugs in Electronic Health Record (EHR), including prescription drugs and dietary supplements, can be repurposed as potential treatment for COVID-19. Design, Setting, and Participants: Based on a retrospective cohort analysis of EHR data, drug-wide association studies (DrugWAS) were performed on COVID-19 patients at Vanderbilt University Medical Center (VUMC). For each drug study, multivariable logistic regression with overlap weighting using propensity score was applied to estimate the effect of drug exposure on COVID-19 disease outcomes. Exposures: Patient exposure to a drug during 1-year prior to the pandemic and COVID-19 diagnosis was chosen as exposure of interest. Natural language processing was employed to extract drug information from clinical notes, in addition to the prescription drug data available in structured format. Main Outcomes and Measures: All-cause of death was selected as primary outcome. Hospitalization, admission to the intensive care unit (ICU), and need for mechanical ventilation were identified as secondary outcomes. Results: The study included 7,768 COVID-19 patients, of which 509 (6.55%) were hospitalized, 82 (1.06%) were admitted to ICU, 64 (0.82%) received mechanical ventilation, and 90 (1.16%) died. Overall, 15 drugs were significantly associated with decreased COVID-19 severity. Previous exposure to either Streptococcus pneumoniae vaccines (adjusted odds ratio [OR], 0.38; 95% CI, 0.14-0.98), diphtheria toxoid vaccine (OR, 0.39; 95% CI, 0.15-0.98), and tetanus toxoid vaccine (OR, 0.39; 95% CI, 0.15-0.98) were significantly associated with a decreased risk of death (primary outcome). Secondary analyses identified several other significant associations showing lower risk for COVID-19 outcomes: 2 vaccines (acellular pertussis, Streptococcus pneumoniae), 3 dietary supplements (turmeric extract, flaxseed extract, omega-3 fatty acids), methylprednisolone acetate, pseudoephedrine, ethinyl estradiol, estradiol, ibuprofen, and fluticasone. Conclusions and Relevance: This cohort study leveraged EHR data to identify a list of drugs that could be repurposed to improve COVID-19 outcomes. Further randomized clinical trials are needed to investigate the efficacy of the proposed drugs.
Collapse
Affiliation(s)
- Cosmin A. Bejan
- Department of Biomedical Informatics; Vanderbilt University Medical Center; Nashville, USA
| | - Katherine N. Cahill
- Department of Medicine; Division of Allergy, Pulmonary and Critical Care Medicine; Vanderbilt University Medical Center; Nashville, USA
| | - Patrick J. Staso
- Department of Medicine; Division of Allergy, Pulmonary and Critical Care Medicine; Vanderbilt University Medical Center; Nashville, USA
| | - Leena Choi
- Department of Biostatistics; Vanderbilt University Medical Center; Nashville, USA
| | - Josh F. Peterson
- Department of Biomedical Informatics; Vanderbilt University Medical Center; Nashville, USA
- Department of Medicine; Vanderbilt University Medical Center; Nashville, USA
| | - Elizabeth J. Phillips
- Department of Pathology, Microbiology and Immunology; Vanderbilt University Medical Center; Nashville, USA
- Department of Medicine; Division of Infectious Diseases; Vanderbilt University Medical Center; Nashville, USA
- Department of Pharmacology; Vanderbilt University Medical Center; Nashville, USA
| |
Collapse
|
13
|
Baxter SL, Klie AR, Radha Saseendrakumar B, Ye GY, Hogarth M. Text Processing for Detection of Fungal Ocular Involvement in Critical Care Patients: Cross-Sectional Study. J Med Internet Res 2020; 22:e18855. [PMID: 32795984 PMCID: PMC7455861 DOI: 10.2196/18855] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 04/21/2020] [Accepted: 06/13/2020] [Indexed: 11/13/2022] Open
Abstract
Background Fungal ocular involvement can develop in patients with fungal bloodstream infections and can be vision-threatening. Ocular involvement has become less common in the current era of improved antifungal therapies. Retrospectively determining the prevalence of fungal ocular involvement is important for informing clinical guidelines, such as the need for routine ophthalmologic consultations. However, manual retrospective record review to detect cases is time-consuming. Objective This study aimed to determine the prevalence of fungal ocular involvement in a critical care database using both structured and unstructured electronic health record (EHR) data. Methods We queried microbiology data from 46,467 critical care patients over 12 years (2000-2012) from the Medical Information Mart for Intensive Care III (MIMIC-III) to identify 265 patients with culture-proven fungemia. For each fungemic patient, demographic data, fungal species present in blood culture, and risk factors for fungemia (eg, presence of indwelling catheters, recent major surgery, diabetes, immunosuppressed status) were ascertained. All structured diagnosis codes and free-text narrative notes associated with each patient’s hospitalization were also extracted. Screening for fungal endophthalmitis was performed using two approaches: (1) by querying a wide array of eye- and vision-related diagnosis codes, and (2) by utilizing a custom regular expression pipeline to identify and collate relevant text matches pertaining to fungal ocular involvement. Both approaches were validated using manual record review. The main outcome measure was the documentation of any fungal ocular involvement. Results In total, 265 patients had culture-proven fungemia, with Candida albicans (n=114, 43%) and Candida glabrata (n=74, 28%) being the most common fungal species in blood culture. The in-hospital mortality rate was 121 (46%). In total, 7 patients were identified as having eye- or vision-related diagnosis codes, none of whom had fungal endophthalmitis based on record review. There were 26,830 free-text narrative notes associated with these 265 patients. A regular expression pipeline based on relevant terms yielded possible matches in 683 notes from 108 patients. Subsequent manual record review again demonstrated that no patients had fungal ocular involvement. Therefore, the prevalence of fungal ocular involvement in this cohort was 0%. Conclusions MIMIC-III contained no cases of ocular involvement among fungemic patients, consistent with prior studies reporting low rates of ocular involvement in fungemia. This study demonstrates an application of natural language processing to expedite the review of narrative notes. This approach is highly relevant for ophthalmology, where diagnoses are often based on physical examination findings that are documented within clinical notes.
Collapse
Affiliation(s)
- Sally L Baxter
- Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, CA, United States.,Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, United States
| | - Adam R Klie
- Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, United States
| | | | - Gordon Y Ye
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, United States
| | - Michael Hogarth
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, United States
| |
Collapse
|