1
|
Iscoe M, Socrates V, Gilson A, Chi L, Li H, Huang T, Kearns T, Perkins R, Khandjian L, Taylor RA. Identifying signs and symptoms of urinary tract infection from emergency department clinical notes using large language models. Acad Emerg Med 2024. [PMID: 38567658 DOI: 10.1111/acem.14883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/24/2024] [Accepted: 01/24/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND Natural language processing (NLP) tools including recently developed large language models (LLMs) have myriad potential applications in medical care and research, including the efficient labeling and classification of unstructured text such as electronic health record (EHR) notes. This opens the door to large-scale projects that rely on variables that are not typically recorded in a structured form, such as patient signs and symptoms. OBJECTIVES This study is designed to acquaint the emergency medicine research community with the foundational elements of NLP, highlighting essential terminology, annotation methodologies, and the intricacies involved in training and evaluating NLP models. Symptom characterization is critical to urinary tract infection (UTI) diagnosis, but identification of symptoms from the EHR has historically been challenging, limiting large-scale research, public health surveillance, and EHR-based clinical decision support. We therefore developed and compared two NLP models to identify UTI symptoms from unstructured emergency department (ED) notes. METHODS The study population consisted of patients aged ≥ 18 who presented to an ED in a northeastern U.S. health system between June 2013 and August 2021 and had a urinalysis performed. We annotated a random subset of 1250 ED clinician notes from these visits for a list of 17 UTI symptoms. We then developed two task-specific LLMs to perform the task of named entity recognition: a convolutional neural network-based model (SpaCy) and a transformer-based model designed to process longer documents (Clinical Longformer). Models were trained on 1000 notes and tested on a holdout set of 250 notes. We compared model performance (precision, recall, F1 measure) at identifying the presence or absence of UTI symptoms at the note level. RESULTS A total of 8135 entities were identified in 1250 notes; 83.6% of notes included at least one entity. Overall F1 measure for note-level symptom identification weighted by entity frequency was 0.84 for the SpaCy model and 0.88 for the Longformer model. F1 measure for identifying presence or absence of any UTI symptom in a clinical note was 0.96 (232/250 correctly classified) for the SpaCy model and 0.98 (240/250 correctly classified) for the Longformer model. CONCLUSIONS The study demonstrated the utility of LLMs and transformer-based models in particular for extracting UTI symptoms from unstructured ED clinical notes; models were highly accurate for detecting the presence or absence of any UTI symptom on the note level, with variable performance for individual symptoms.
Collapse
Affiliation(s)
- Mark Iscoe
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Vimig Socrates
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Aidan Gilson
- Yale School of Medicine, New Haven, Connecticut, USA
| | - Ling Chi
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Huan Li
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Thomas Huang
- Yale School of Medicine, New Haven, Connecticut, USA
| | - Thomas Kearns
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Rachelle Perkins
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Laura Khandjian
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - R Andrew Taylor
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
| |
Collapse
|
2
|
Safranek CW, Huang T, Wright DS, Wright CX, Socrates V, Sangal RB, Iscoe M, Chartash D, Taylor RA. Automated HEART score determination via ChatGPT: Honing a framework for iterative prompt development. J Am Coll Emerg Physicians Open 2024; 5:e13133. [PMID: 38481520 PMCID: PMC10936537 DOI: 10.1002/emp2.13133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/25/2024] [Accepted: 02/10/2024] [Indexed: 03/17/2024] Open
Abstract
Objectives This study presents a design framework to enhance the accuracy by which large language models (LLMs), like ChatGPT can extract insights from clinical notes. We highlight this framework via prompt refinement for the automated determination of HEART (History, ECG, Age, Risk factors, Troponin risk algorithm) scores in chest pain evaluation. Methods We developed a pipeline for LLM prompt testing, employing stochastic repeat testing and quantifying response errors relative to physician assessment. We evaluated the pipeline for automated HEART score determination across a limited set of 24 synthetic clinical notes representing four simulated patients. To assess whether iterative prompt design could improve the LLMs' ability to extract complex clinical concepts and apply rule-based logic to translate them to HEART subscores, we monitored diagnostic performance during prompt iteration. Results Validation included three iterative rounds of prompt improvement for three HEART subscores with 25 repeat trials totaling 1200 queries each for GPT-3.5 and GPT-4. For both LLM models, from initial to final prompt design, there was a decrease in the rate of responses with erroneous, non-numerical subscore answers. Accuracy of numerical responses for HEART subscores (discrete 0-2 point scale) improved for GPT-4 from the initial to final prompt iteration, decreasing from a mean error of 0.16-0.10 (95% confidence interval: 0.07-0.14) points. Conclusion We established a framework for iterative prompt design in the clinical space. Although the results indicate potential for integrating LLMs in structured clinical note analysis, translation to real, large-scale clinical data with appropriate data privacy safeguards is needed.
Collapse
Affiliation(s)
- Conrad W. Safranek
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
| | - Thomas Huang
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
| | - Donald S. Wright
- Department of Emergency MedicineYale University School of MedicineNew HavenConnecticutUSA
| | - Catherine X. Wright
- Department of Cardiovascular MedicineYale University School of MedicineNew HavenConnecticutUSA
| | - Vimig Socrates
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
| | - Rohit B. Sangal
- Department of Emergency MedicineYale University School of MedicineNew HavenConnecticutUSA
| | - Mark Iscoe
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
- Department of Emergency MedicineYale University School of MedicineNew HavenConnecticutUSA
| | - David Chartash
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
- School of MedicineUniversity College Dublin, National University of IrelandDublinRepublic of Ireland
| | - R. Andrew Taylor
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew HavenConnecticutUSA
- Department of Emergency MedicineYale University School of MedicineNew HavenConnecticutUSA
| |
Collapse
|
3
|
Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D. Correction: How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ 2024; 10:e57594. [PMID: 38412478 PMCID: PMC10933712 DOI: 10.2196/57594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 02/20/2024] [Indexed: 02/29/2024]
Abstract
[This corrects the article DOI: 10.2196/45312.].
Collapse
Affiliation(s)
- Aidan Gilson
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew Haven, CTUnited States
- Department of Emergency MedicineYale University School of MedicineNew Haven, CTUnited States
| | - Conrad W Safranek
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew Haven, CTUnited States
| | - Thomas Huang
- Department of Emergency MedicineYale University School of MedicineNew Haven, CTUnited States
| | - Vimig Socrates
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew Haven, CTUnited States
- Program of Computational Biology and BioinformaticsYale UniversityNew Haven, CTUnited States
| | - Ling Chi
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew Haven, CTUnited States
| | - Richard Andrew Taylor
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew Haven, CTUnited States
- Department of Emergency MedicineYale University School of MedicineNew Haven, CTUnited States
| | - David Chartash
- Section for Biomedical Informatics and Data ScienceYale University School of MedicineNew Haven, CTUnited States
- School of MedicineUniversity College DublinNational University of Ireland, DublinDublinIreland
| |
Collapse
|
4
|
Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D. Authors' Reply to: Variability in Large Language Models' Responses to Medical Licensing and Certification Examinations. JMIR Med Educ 2023; 9:e50336. [PMID: 37440299 PMCID: PMC10375396 DOI: 10.2196/50336] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 07/05/2023] [Indexed: 07/14/2023]
Affiliation(s)
- Aidan Gilson
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - Conrad W Safranek
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
| | - Thomas Huang
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - Vimig Socrates
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | - Ling Chi
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
| | - Richard Andrew Taylor
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - David Chartash
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
- School of Medicine, University College Dublin, National University of Ireland, Dublin, Dublin, Ireland
| |
Collapse
|
5
|
Socrates V, Gilson A, Lopez K, Chi L, Taylor RA, Chartash D. Predicting relations between SOAP note sections: The value of incorporating a clinical information model. J Biomed Inform 2023; 141:104360. [PMID: 37061014 PMCID: PMC10197152 DOI: 10.1016/j.jbi.2023.104360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/27/2023] [Accepted: 04/05/2023] [Indexed: 04/17/2023]
Abstract
Physician progress notes are frequently organized into Subjective, Objective, Assessment, and Plan (SOAP) sections. The Assessment section synthesizes information recorded in the Subjective and Objective sections, and the Plan section documents tests and treatments to narrow the differential diagnosis and manage symptoms. Classifying the relationship between the Assessment and Plan sections has been suggested to provide valuable insight into clinical reasoning. In this work, we use a novel human-in-the-loop pipeline to classify the relationships between the Assessment and Plan sections of SOAP notes as a part of the n2c2 2022 Track 3 Challenge. In particular, we use a clinical information model constructed from both the entailment logic expected from the aforementioned Challenge and the problem-oriented medical record. This information model is used to label named entities as primary and secondary problems/symptoms, events and complications in all four SOAP sections. We iteratively train separate Named Entity Recognition models and use them to annotate entities in all notes/sections. We fine-tune a downstream RoBERTa-large model to classify the Assessment-Plan relationship. We evaluate multiple language model architectures, preprocessing parameters, and methods of knowledge integration, achieving a maximum macro-F1 score of 82.31%. Our initial model achieves top-2 performance during the challenge (macro-F1: 81.52%, competitors' macro-F1 range: 74.54%-82.12%). We improved our model by incorporating post-challenge annotations (S&O sections), outperforming the top model from the Challenge. We also used Shapley additive explanations to investigate the extent of language model clinical logic, under the lens of our clinical information model. We find that the model often uses shallow heuristics and nonspecific attention when making predictions, suggesting language model knowledge integration requires further research.
Collapse
Affiliation(s)
- Vimig Socrates
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 300 George St, 06511, New Haven, USA; Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA; Program of Computational Biology and Bioinformatics, Yale University, 300 George St, New Haven, 06511, USA.
| | - Aidan Gilson
- Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA.
| | - Kevin Lopez
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 300 George St, 06511, New Haven, USA; Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA.
| | - Ling Chi
- Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA.
| | - Richard Andrew Taylor
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 300 George St, 06511, New Haven, USA; Department of Emergency Medicine, Yale University School of Medicine, 464 Congress Ave #260, New Haven, 06519, USA.
| | - David Chartash
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, 300 George St, 06511, New Haven, USA; School of Medicine, University College Dublin - National University of Ireland, Dublin, Health Sciences Centre, Belfield, Dublin 4, Ireland.
| |
Collapse
|
6
|
Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D. How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ 2023; 9:e45312. [PMID: 36753318 PMCID: PMC9947764 DOI: 10.2196/45312] [Citation(s) in RCA: 352] [Impact Index Per Article: 352.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/27/2023] [Accepted: 01/29/2023] [Indexed: 05/05/2023]
Abstract
BACKGROUND Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user input. OBJECTIVE This study aimed to evaluate the performance of ChatGPT on questions within the scope of the United States Medical Licensing Examination (USMLE) Step 1 and Step 2 exams, as well as to analyze responses for user interpretability. METHODS We used 2 sets of multiple-choice questions to evaluate ChatGPT's performance, each with questions pertaining to Step 1 and Step 2. The first set was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the user base. The second set was the National Board of Medical Examiners (NBME) free 120 questions. ChatGPT's performance was compared to 2 other large language models, GPT-3 and InstructGPT. The text output of each ChatGPT response was evaluated across 3 qualitative metrics: logical justification of the answer selected, presence of information internal to the question, and presence of information external to the question. RESULTS Of the 4 data sets, AMBOSS-Step1, AMBOSS-Step2, NBME-Free-Step1, and NBME-Free-Step2, ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102), respectively. ChatGPT outperformed InstructGPT by 8.15% on average across all data sets, and GPT-3 performed similarly to random chance. The model demonstrated a significant decrease in performance as question difficulty increased (P=.01) within the AMBOSS-Step1 data set. We found that logical justification for ChatGPT's answer selection was present in 100% of outputs of the NBME data sets. Internal information to the question was present in 96.8% (183/189) of all questions. The presence of information external to the question was 44.5% and 27% lower for incorrect answers relative to correct answers on the NBME-Free-Step1 (P<.001) and NBME-Free-Step2 (P=.001) data sets, respectively. CONCLUSIONS ChatGPT marks a significant improvement in natural language processing models on the tasks of medical question answering. By performing at a greater than 60% threshold on the NBME-Free-Step-1 data set, we show that the model achieves the equivalent of a passing score for a third-year medical student. Additionally, we highlight ChatGPT's capacity to provide logic and informational context across the majority of answers. These facts taken together make a compelling case for the potential applications of ChatGPT as an interactive medical education tool to support learning.
Collapse
Affiliation(s)
- Aidan Gilson
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - Conrad W Safranek
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
| | - Thomas Huang
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - Vimig Socrates
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | - Ling Chi
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
| | - Richard Andrew Taylor
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - David Chartash
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, United States
- School of Medicine, University College Dublin, National University of Ireland, Dublin, Dublin, Ireland
| |
Collapse
|
7
|
Melnick ER, Ong SY, Fong A, Socrates V, Ratwani RM, Nath B, Simonov M, Salgia A, Williams B, Marchalik D, Goldstein R, Sinsky CA. Characterizing physician EHR use with vendor derived data: a feasibility study and cross-sectional analysis. J Am Med Inform Assoc 2021; 28:1383-1392. [PMID: 33822970 PMCID: PMC8279798 DOI: 10.1093/jamia/ocab011] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 01/10/2021] [Accepted: 01/15/2021] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE To derive 7 proposed core electronic health record (EHR) use metrics across 2 healthcare systems with different EHR vendor product installations and examine factors associated with EHR time. MATERIALS AND METHODS A cross-sectional analysis of ambulatory physicians EHR use across the Yale-New Haven and MedStar Health systems was performed for August 2019 using 7 proposed core EHR use metrics normalized to 8 hours of patient scheduled time. RESULTS Five out of 7 proposed metrics could be measured in a population of nonteaching, exclusively ambulatory physicians. Among 573 physicians (Yale-New Haven N = 290, MedStar N = 283) in the analysis, median EHR-Time8 was 5.23 hours. Gender, additional clinical hours scheduled, and certain medical specialties were associated with EHR-Time8 after adjusting for age and health system on multivariable analysis. For every 8 hours of scheduled patient time, the model predicted these differences in EHR time (P < .001, unless otherwise indicated): female physicians +0.58 hours; each additional clinical hour scheduled per month -0.01 hours; practicing cardiology -1.30 hours; medical subspecialties -0.89 hours (except gastroenterology, P = .002); neurology/psychiatry -2.60 hours; obstetrics/gynecology -1.88 hours; pediatrics -1.05 hours (P = .001); sports/physical medicine and rehabilitation -3.25 hours; and surgical specialties -3.65 hours. CONCLUSIONS For every 8 hours of scheduled patient time, ambulatory physicians spend more than 5 hours on the EHR. Physician gender, specialty, and number of clinical hours practicing are associated with differences in EHR time. While audit logs remain a powerful tool for understanding physician EHR use, additional transparency, granularity, and standardization of vendor-derived EHR use data definitions are still necessary to standardize EHR use measurement.
Collapse
Affiliation(s)
- Edward R Melnick
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Shawn Y Ong
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Allan Fong
- MedStar Health National Center for Human Factors in Healthcare, Washington, DC, USA
| | - Vimig Socrates
- Computational Biology and Bioinformatics, Yale School of Medicine, New Haven, Connecticut, USA
| | - Raj M Ratwani
- MedStar Health National Center for Human Factors in Healthcare, Washington, DC, USA
| | - Bidisha Nath
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Michael Simonov
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Anup Salgia
- Northeast Ohio Medical University and Cerner Corporation, Kansas City, Missouri, USA
| | - Brian Williams
- Northeast Medical Group, Yale-New Haven Health, Stratford, Connecticut, USA
| | | | - Richard Goldstein
- Northeast Medical Group, Yale-New Haven Health, Stratford, Connecticut, USA
| | - Christine A Sinsky
- Professional Satisfaction and Practice Sustainability, American Medical Association, Chicago, Illinois, USA
| |
Collapse
|
8
|
Socrates V, Gershon A, Sahoo SS. Computation of Brain Functional Connectivity Network Measures in Epilepsy: A Web-Based Platform for EEG Signal Data Processing and Analysis. Stud Health Technol Inform 2019; 264:1590-1591. [PMID: 31438246 DOI: 10.3233/shti190549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Epilepsy is a serious neurological disorder that affects nearly 60 million individuals worldwide and it is characterized by repeated seizures. Graph theoretic approaches have been developed to analyze functional brain networks that underpin epileptogenic network. We have developed a Web-based application that enables neuroscientists to process high resolution Stereotactic Electroencephalogram (SEEG) signal data and compute various kinds of signal coupling measures using an intuitive user interface for study of epilepsy seizure networks. Results of a systematic evaluation of this new application show that it scales with increasing volume of signal data.
Collapse
Affiliation(s)
- Vimig Socrates
- Department of Population & Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA
| | - Arthur Gershon
- Department of Population & Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Satya S Sahoo
- Department of Population & Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
9
|
Valdez J, Kim M, Rueschman M, Socrates V, Redline S, Sahoo SS. ProvCaRe Semantic Provenance Knowledgebase: Evaluating Scientific Reproducibility of Research Studies. AMIA Annu Symp Proc 2018; 2017:1705-1714. [PMID: 29854241 PMCID: PMC5977728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Scientific reproducibility is critical for biomedical research as it enables us to advance science by building on previous results, helps ensure the success of increasingly expensive drug trials, and allows funding agencies to make informed decisions. However, there is a growing "crisis" of reproducibility as evidenced by a recent Nature journal survey of more than 1500 researchers that found that 70% of researchers were not able to replicate results from other research groups and more than 50% of researchers were not able reproduce their own research results. In 2016, the National Institutes of Health (NIH) announced the "Rigor and Reproducibility" guidelines to support reproducibility in biomedical research. A key component of the NIH Rigor and Reproducibility guidelines is the recording and analysis of "provenance" information, which describes the origin or history of data and plays a central role in ensuring scientific reproducibility. As part of the NIH Big Data to Knowledge (BD2K)-funded data provenance project, we have developed a new informatics framework called Provenance for Clinical and Healthcare Research (ProvCaRe) to extract, model, and analyze provenance information from published literature describing research studies. Using sleep medicine research studies that have made their data available through the National Sleep Research Resource (NSRR), we have developed an automated pipeline to identify and extract provenance metadata from published literature that is made available for analysis in the ProvCaRe knowledgebase. NSRR is the largest repository of sleep data from over 40,000 studies involving 36,000 participants and we used 75 published articles describing 6 research studies to populate the ProvCaRe knowledgebase. We evaluated the ProvCaRe knowledgebase with 28,474 "provenance triples" using hypothesis-driven queries to identify and rank research studies based on the provenance information extracted from published articles.
Collapse
Affiliation(s)
- Joshua Valdez
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH
| | - Matthew Kim
- Department of Medicine, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Harvard University Boston, MA
| | - Michael Rueschman
- Department of Medicine, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Harvard University Boston, MA
| | - Vimig Socrates
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH
| | - Susan Redline
- Department of Medicine, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Harvard University Boston, MA
| | - Satya S Sahoo
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH
| |
Collapse
|