1
|
Chen X, Zhang W, Xu P, Zhao Z, Zheng Y, Shi D, He M. FFA-GPT: an automated pipeline for fundus fluorescein angiography interpretation and question-answer. NPJ Digit Med 2024; 7:111. [PMID: 38702471 PMCID: PMC11068733 DOI: 10.1038/s41746-024-01101-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 04/04/2024] [Indexed: 05/06/2024] Open
Abstract
Fundus fluorescein angiography (FFA) is a crucial diagnostic tool for chorioretinal diseases, but its interpretation requires significant expertise and time. Prior studies have used Artificial Intelligence (AI)-based systems to assist FFA interpretation, but these systems lack user interaction and comprehensive evaluation by ophthalmologists. Here, we used large language models (LLMs) to develop an automated interpretation pipeline for both report generation and medical question-answering (QA) for FFA images. The pipeline comprises two parts: an image-text alignment module (Bootstrapping Language-Image Pre-training) for report generation and an LLM (Llama 2) for interactive QA. The model was developed using 654,343 FFA images with 9392 reports. It was evaluated both automatically, using language-based and classification-based metrics, and manually by three experienced ophthalmologists. The automatic evaluation of the generated reports demonstrated that the system can generate coherent and comprehensible free-text reports, achieving a BERTScore of 0.70 and F1 scores ranging from 0.64 to 0.82 for detecting top-5 retinal conditions. The manual evaluation revealed acceptable accuracy (68.3%, Kappa 0.746) and completeness (62.3%, Kappa 0.739) of the generated reports. The generated free-form answers were evaluated manually, with the majority meeting the ophthalmologists' criteria (error-free: 70.7%, complete: 84.0%, harmless: 93.7%, satisfied: 65.3%, Kappa: 0.762-0.834). This study introduces an innovative framework that combines multi-modal transformers and LLMs, enhancing ophthalmic image interpretation, and facilitating interactive communications during medical consultation.
Collapse
Affiliation(s)
- Xiaolan Chen
- School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
| | - Weiyi Zhang
- School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
| | - Pusheng Xu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Ziwei Zhao
- School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
| | - Yingfeng Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Danli Shi
- School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China.
- Research Centre for SHARP Vision (RCSV), The Hong Kong Polytechnic University, Kowloon, Hong Kong, China.
| | - Mingguang He
- School of Optometry, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
- Research Centre for SHARP Vision (RCSV), The Hong Kong Polytechnic University, Kowloon, Hong Kong, China
- Centre for Eye and Vision Research (CEVR), 17W Hong Kong Science Park, Hong Kong, China
| |
Collapse
|
2
|
Yu P, Xu H, Hu X, Deng C. Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. Healthcare (Basel) 2023; 11:2776. [PMID: 37893850 PMCID: PMC10606429 DOI: 10.3390/healthcare11202776] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/13/2023] [Accepted: 10/17/2023] [Indexed: 10/29/2023] Open
Abstract
Generative artificial intelligence (AI) and large language models (LLMs), exemplified by ChatGPT, are promising for revolutionizing data and information management in healthcare and medicine. However, there is scant literature guiding their integration for non-AI professionals. This study conducts a scoping literature review to address the critical need for guidance on integrating generative AI and LLMs into healthcare and medical practices. It elucidates the distinct mechanisms underpinning these technologies, such as Reinforcement Learning from Human Feedback (RLFH), including few-shot learning and chain-of-thought reasoning, which differentiates them from traditional, rule-based AI systems. It requires an inclusive, collaborative co-design process that engages all pertinent stakeholders, including clinicians and consumers, to achieve these benefits. Although global research is examining both opportunities and challenges, including ethical and legal dimensions, LLMs offer promising advancements in healthcare by enhancing data management, information retrieval, and decision-making processes. Continued innovation in data acquisition, model fine-tuning, prompt strategy development, evaluation, and system implementation is imperative for realizing the full potential of these technologies. Organizations should proactively engage with these technologies to improve healthcare quality, safety, and efficiency, adhering to ethical and legal guidelines for responsible application.
Collapse
Affiliation(s)
- Ping Yu
- School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522, Australia
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, 100 College Street, Fl 9, New Haven, CT 06510, USA
| | - Xia Hu
- Department of Computer Science, Rice University, P.O. Box 1892, Houston, TX 77251-1892, USA
| | - Chao Deng
- School of Medical, Indigenous and Health Sciences, University of Wollongong, Wollongong, NSW 2522, Australia
| |
Collapse
|
3
|
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Schärli N, Chowdhery A, Mansfield P, Demner-Fushman D, Agüera Y Arcas B, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V. Large language models encode clinical knowledge. Nature 2023; 620:172-180. [PMID: 37438534 PMCID: PMC10396962 DOI: 10.1038/s41586-023-06291-2] [Citation(s) in RCA: 256] [Impact Index Per Article: 256.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 06/05/2023] [Indexed: 07/14/2023]
Abstract
Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model1 (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM2 on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA3, MedMCQA4, PubMedQA5 and Measuring Massive Multitask Language Understanding (MMLU) clinical topics6), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.
Collapse
Affiliation(s)
| | | | - Tao Tu
- Google Research, Mountain View, CA, USA
| | | | - Jason Wei
- Google Research, Mountain View, CA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Yun Liu
- Google Research, Mountain View, CA, USA
| | | | | | | | | | | |
Collapse
|
4
|
Adapa K, Ivester T, Shea C, Shultz B, DeWalt D, Pearsall M, Dangerfield C, Burgess E, Marks LB, Mazur LM. The Effect of a System-Level Tiered Huddle System on Reporting Patient Safety Events: An Interrupted Time Series Analysis. Jt Comm J Qual Patient Saf 2022; 48:642-652. [PMID: 36153293 DOI: 10.1016/j.jcjq.2022.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 08/10/2022] [Accepted: 08/15/2022] [Indexed: 12/30/2022]
Abstract
BACKGROUND The objective of this research was to evaluate the effect of implementing a system-level tiered huddle system (THS) on the reporting of patient safety events into the official event reporting system. METHODS A quasi-experimental study using interrupted time series was conducted to assess the impact and changes to trends in the reporting of patient safety events pre- (February-July 2020; six months) and post- (September 2020-February 2021; six months) THS implementation within one health care system (238 clinics and 4 hospitals). The severity of harm was analyzed in July 2021 using a modified Agency for Healthcare Research and Quality (AHRQ) harm score classification. The primary outcome measure was the number of patient safety events reported per month. Secondary outcomes included the number of patient safety events reported per month by each AHRQ harm score classification. RESULTS The system-level THS implementation led to a significant and immediate increase in the total number of patient safety events reported per month (777.73, 95% confidence interval [CI] 310.78-1,244.68, p = 0.004). Similar significant increases were seen for reported numbers of unsafe conditions, near misses, no-harm events that reached patients, and temporary harm (p < 0.05 for each). Reporting of events with permanent harm and deaths also increased but was not statistically significant, likely due to the small number of reported events involving actual harm. CONCLUSION These findings suggest that system-level THS implementation may increase reporting of patient safety events in the official event reporting system.
Collapse
|
5
|
Malik MA, Motta-Calderon D, Piniella N, Garber A, Konieczny K, Lam A, Plombon S, Carr K, Yoon C, Griffin J, Lipsitz S, Schnipper JL, Bates DW, Dalal AK. A structured approach to EHR surveillance of diagnostic error in acute care: an exploratory analysis of two institutionally-defined case cohorts. Diagnosis (Berl) 2022; 9:446-457. [PMID: 35993878 PMCID: PMC9651987 DOI: 10.1515/dx-2022-0032] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 07/12/2022] [Indexed: 12/29/2022]
Abstract
OBJECTIVES To test a structured electronic health record (EHR) case review process to identify diagnostic errors (DE) and diagnostic process failures (DPFs) in acute care. METHODS We adapted validated tools (Safer Dx, Diagnostic Error Evaluation Research [DEER] Taxonomy) to assess the diagnostic process during the hospital encounter and categorized 13 postulated e-triggers. We created two test cohorts of all preventable cases (n=28) and an equal number of randomly sampled non-preventable cases (n=28) from 365 adult general medicine patients who expired and underwent our institution's mortality case review process. After excluding patients with a length of stay of more than one month, each case was reviewed by two blinded clinicians trained in our process and by an expert panel. Inter-rater reliability was assessed. We compared the frequency of DE contributing to death in both cohorts, as well as mean DPFs and e-triggers for DE positive and negative cases within each cohort. RESULTS Twenty-seven (96.4%) preventable and 24 (85.7%) non-preventable cases underwent our review process. Inter-rater reliability was moderate between individual reviewers (Cohen's kappa 0.41) and substantial with the expert panel (Cohen's kappa 0.74). The frequency of DE contributing to death was significantly higher for the preventable compared to the non-preventable cohort (56% vs. 17%, OR 6.25 [1.68, 23.27], p<0.01). Mean DPFs and e-triggers were significantly and non-significantly higher for DE positive compared to DE negative cases in each cohort, respectively. CONCLUSIONS We observed substantial agreement among final consensus and expert panel reviews using our structured EHR case review process. DEs contributing to death associated with DPFs were identified in institutionally designated preventable and non-preventable cases. While e-triggers may be useful for discriminating DE positive from DE negative cases, larger studies are required for validation. Our approach has potential to augment institutional mortality case review processes with respect to DE surveillance.
Collapse
Affiliation(s)
- Maria A. Malik
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Daniel Motta-Calderon
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Nicholas Piniella
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Alison Garber
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Kaitlyn Konieczny
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Alyssa Lam
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Savanna Plombon
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Kevin Carr
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Catherine Yoon
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | | | - Stuart Lipsitz
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Jeffrey L. Schnipper
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - David W. Bates
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Anuj K. Dalal
- Division of General Internal Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| |
Collapse
|
6
|
Bradford A, Shahid U, Schiff GD, Graber ML, Marinez A, DiStabile P, Timashenka A, Jalal H, Brady PJ, Singh H. Development and Usability Testing of the Agency for Healthcare Research and Quality Common Formats to Capture Diagnostic Safety Events. J Patient Saf 2022; 18:521-525. [PMID: 35443253 PMCID: PMC9391254 DOI: 10.1097/pts.0000000000001006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES A lack of consensus around definitions and reporting standards for diagnostic errors limits the extent to which healthcare organizations can aggregate, analyze, share, and learn from these events. In response to this problem, the Agency for Healthcare Research and Quality (AHRQ) began the development of the Common Formats for Event Reporting for Diagnostic Safety Events (CFER-DS). We conducted a usability assessment of the draft CFER-DS to inform future revision and implementation. METHODS We recruited a purposive sample of quality and safety personnel working in 8 U.S. healthcare organizations. Participants were invited to use the CFER-DS to simulate reporting for a minimum of 5 cases of diagnostic safety events and then provide written and verbal qualitative feedback. Analysis focused on participants' perceptions of content validity, ease of use, and potential for implementation. RESULTS Estimated completion time was 30 to 90 minutes per event. Participants shared generally positive feedback about content coverage and item clarity but identified reporter burden as a potential concern. Participants also identified opportunities to clarify several conceptual definitions, ensure applicability across different care settings, and develop guidance to operationalize use of CFER-DS. Findings led to refinement of content and supplementary materials to facilitate implementation. CONCLUSIONS Standardized definitions of diagnostic safety events and reporting standards for contextual information and contributing factors can help capture and analyze diagnostic safety events. In addition to usability testing, additional feedback from the field will ensure that AHRQ's CFER-DS is useful to a broad range of users for learning and safety improvement.
Collapse
Affiliation(s)
- Andrea Bradford
- From the Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center and Baylor College of Medicine
- Department of Medicine, Baylor College of Medicine, Houston, Texas
| | - Umber Shahid
- From the Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center and Baylor College of Medicine
- Department of Medicine, Baylor College of Medicine, Houston, Texas
| | - Gordon D. Schiff
- Center for Patient Safety Research and Practice, Brigham and Women’s Hospital
- Harvard Medical School Center for Primary Care, Boston, Massachusetts
| | - Mark L. Graber
- Society to Improve Diagnosis in Medicine, Chicago, Illinois
| | - Abigail Marinez
- From the Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center and Baylor College of Medicine
- Department of Medicine, Baylor College of Medicine, Houston, Texas
| | - Paula DiStabile
- Agency for Healthcare Research and Quality, Rockville, Maryland
| | | | - Hamid Jalal
- Agency for Healthcare Research and Quality, Rockville, Maryland
| | | | - Hardeep Singh
- From the Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center and Baylor College of Medicine
- Department of Medicine, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
7
|
Barsky M, Olson APJ, Astik GJ. Classifying and Disclosing Medical Errors. Med Clin North Am 2022; 106:675-687. [PMID: 35725233 DOI: 10.1016/j.mcna.2022.02.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Medical errors are an unfortunate but common occurrence in health care. It is important to understand what medical errors are and what types of harm can occur to patients. Along with recognition of the error, disclosure is an equally important part of the process. Clinicians should provide open and honest discussion about the events that occurred to patients along with feedback to institutions on ways to prevent such errors in the future.
Collapse
Affiliation(s)
- Maria Barsky
- Hospitalist Program, UC Irvine Medical Center, 101 The City Drive South, Suite 500, Orange, CA 92868, USA.
| | - Andrew P J Olson
- Section of Hospital Medicine, Division of General Internal Medicine, Department of Medicine, , University of Minnesota Medical School, 420 Delaware Street Southeast, MMC 741, Minneapolis, MN 55455, USA; Division of Pediatric Hospital Medicine, Department of Pediatrics, University of Minnesota Medical School, 420 Delaware Street Southeast, MMC 741, Minneapolis, MN 55455, USA. https://twitter.com/@andrewolsonmd
| | - Gopi J Astik
- Division of Hospital Medicine, Northwestern University Feinberg School of Medicine, 251 East Huron Street Suite 16-738, Chicago, IL 60611, USA. https://twitter.com/@gopiastik
| |
Collapse
|
8
|
Marks A, Takahashi C, Anand P, Lau KHV. Two-Year Profile of Preventable Errors in Hospital-Based Neurology. Neurol Clin Pract 2022; 12:218-222. [DOI: 10.1212/cpj.0000000000001163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 01/26/2022] [Indexed: 11/15/2022]
Abstract
AbstractBackground and Objectives:Medical errors are estimated to cause 7,000 deaths and cost 17-29 billion USD per year, but there is a lack of published real-world data on preventable errors, in particular in hospital-based neurology. We sought to characterize the profile of errors that occur on the inpatient neurology services at our institution in order to inform strategies on future error prevention.Methods:We reviewed all cases of preventable errors occurring on the inpatient neurology services from July 1, 2018 to June 30, 2020, logged in institutional error reporting systems and reviewed at departmental morbidity and mortality conferences (M&MC). Each case was characterized by primary category of error, level of harm as determined by the Agency for Healthcare Research & Quality (AHRQ) Common Format Harm Scale version 1.2, primary intervention, and recurrence within one year, with a final censoring date of June 30, 2021.Results:Of 72 cases, 43 (60%) were attributed to errors in clinical decision-making and 20 (28%) to systems or electronic health record-related errors. The majority of cases resulted in in-conference education on systems-based errors (29%) at departmental M&MCs followed by in-conference education on clinical neurology (25%). Among errors classified primarily as clinical, 28% were addressed via systems-based interventions including in-conference education on systems issues and changes in written protocol. In 23 cases (32%), a similar error recurred within one year of the presentation. In total, 7 cases (10%) resulted in a change in written protocol, none with recurrences.Discussion:Systems-based interventions may reduce both clinical and systems-based errors, and protocol changes are effective when feasible. Given the important goal of optimizing care for every patient, quality leaders should conduct continuous audits of preventable errors and quality improvement systems in their clinical areas.
Collapse
|
9
|
Grabinski ZG, Babineau J, Jamal N, Silberman AP, Dufault J, Ford BL, Kessler DO. Reporting of Unsafe Conditions at an Academic Women and Children's Hospital. Jt Comm J Qual Patient Saf 2021; 47:731-738. [PMID: 34544657 DOI: 10.1016/j.jcjq.2021.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 08/03/2021] [Accepted: 08/04/2021] [Indexed: 10/20/2022]
Abstract
BACKGROUND Unsafe conditions (UCs) are circumstances that increase the probability of a patient safety event occurring. Each UC identified presents an opportunity to prevent a near miss or adverse patient event through proactive mitigation. The aim of this study was to describe the frequency, characteristics, contributing factors, and potential for harm of reported UCs. METHODS This is a retrospective descriptive analysis of UC incident reports voluntarily entered into an electronic medical event reporting system at a single tertiary care women and children's hospital. Reports were reviewed and categorized using a previously published classification scheme and a modified Healthcare Failure Mode and Effects Analysis (HFMEA). Reporter role, hospital location, and time to incident resolution were also described. RESULTS Between July 1, 2016, and June 30, 2019, 348 UCs were entered, representing 3.4% of all reports. Predominant categories of UCs were equipment (43.7%), medication (20.7%), and environmental safety (14.4%). A contributing factor was identified for >99.4% of all UCs, with 77.6% having more than one. Nurses (70.1%) submitted the highest numbers of UCs. The majority of UCs were of mild severity (79.9%) but had the potential to recur frequently (73.3%). CONCLUSION UCs represented a small proportion of all reported events across the hospital. Equipment and medication issues were important causes of UCs, and most UCs had one or more contributing factors. Though most UCs were of mild severity, they had a predicted potential to recur frequently, representing significant opportunities for improvement.
Collapse
|
10
|
Lee K, Yoon K, Yoon B, Shin E. Differences in the perception of harm assessment among nurses in the patient safety classification system. PLoS One 2020; 15:e0243583. [PMID: 33284853 PMCID: PMC7721130 DOI: 10.1371/journal.pone.0243583] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 11/23/2020] [Indexed: 11/18/2022] Open
Abstract
Background Precise harm assessment by the medical staff is very important in a patient safety event reporting system but there are differences in perception due to insufficiencies in education. Methods We developed the survey tool consisting of nine patient safety incident scenarios to investigate the interrater agreement in the harm score assigning among nurses. The survey tool was distributed to 287 nurses working at two hospitals. Results The overall kappa value for interrater agreement was k = 0.21 for harm and k = 0.28 for harm duration. In nine patient safety event scenarios, such as “mislabeled specimen” or “chest tube drain”, when the degree of harm was not clear, the assessments of harm and harm duration were somewhat dispersed. Conclusion For the quality of the patient safety incident reporting system, the accurate harm assessment of medical personnel is highly important; however, results in this study indicated that theassessment of the degree of harm by Korean nurses was not standardized. The reason for this variability could be due to the lack of education that takes harm assessment into account. Therefore, training in harm assessment and the development of programs to support this training are both necessary.
Collapse
Affiliation(s)
- Kwangmi Lee
- Department of Nursing, National Cancer Center, Goyang, Republic of Korea
| | - Kyeongsuk Yoon
- Division of Nursing, Yonsei University Wonju Severance Christian Hospital, Wonju, Republic of Korea
| | - Byeongsook Yoon
- Division of Nursing, Yonsei University Wonju Severance Christian Hospital, Wonju, Republic of Korea
| | - Eunhee Shin
- Department of Nursing Science, Sangji University College of Health Sciences, Wonju, Republic of Korea
- * E-mail:
| |
Collapse
|
11
|
Haydar B, Baetzel A, Elliott A, MacEachern M, Kamal A, Christensen R. Adverse Events During Intrahospital Transport of Critically Ill Children: A Systematic Review. Anesth Analg 2020; 131:1135-1145. [PMID: 32925334 DOI: 10.1213/ane.0000000000004585] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Intrahospital transport of a critically ill patient is often required to achieve a diagnostic and/or therapeutic objective. However, clinicians who recommend a procedure that requires transport are often not fully aware of the risks of transport. Clinicians involved in the care of critically ill children may therefore benefit from a clear enumeration of adverse events that have occurred during transport, risk factors for those events, and guidance for event prevention. The objective of this review was to collect all published harm and adverse events that occurred in critically ill children in the context of transport within a medical center, as well as the incidence of each type of event. A secondary objective was to identify what interventions have been previously studied that reduce events and to collect recommendations for harm prevention from study authors. Ovid MEDLINE, Cochrane Central Register of Controlled Trials, Embase, and CINAHL were searched in January 2018 and again in December 2018. Terms indicating pediatric patients, intrahospital transport, critical illness, and adverse events were used. Titles and abstracts were screened and full text was reviewed for any article meeting inclusion criteria. If articles included both children and adults, incidence data were collected only if the number of pediatric patients could be ascertained. Of 471 full-text articles reviewed, 40 met inclusion criteria, of which 24 included only children, totaling 4104 patient transports. Heterogeneity was high, owing to a wide range of populations, settings, data collection methods, and outcomes. The incidence of adverse events varied widely between studies. Examples of harm included emergent tracheostomy, pneumothorax, and cardiac arrest requiring chest compressions. Respiratory and airway events were the most common type of adverse event. Hypothermia was common in infants. One transport-associated death was reported. When causation was assessed, most events were judged to have been preventable or potentially mitigated by improved double-checks and usage of checklists. Prospective studies demonstrated the superiority of mechanical ventilation over manual ventilation for intubated patients. Risk of adverse events during critical care transport appears to relate to the patient's underlying illness and degree of respiratory support. Recommendations for reducing these adverse events have frequently included the use of checklists. Other recommendations include optimization of the patient's physiological status before transport, training with transport equipment, double-checking of equipment before transport, and having experienced clinicians accompany the patient. All available recommendations for reducing transport-associated adverse events in included articles were collated and included.
Collapse
Affiliation(s)
- Bishr Haydar
- From the Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan
| | - Anne Baetzel
- From the Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan
| | - Anila Elliott
- From the Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan
| | - Mark MacEachern
- Taubman Health Sciences Library, University of Michigan, Ann Arbor, Michigan
| | - Afra Kamal
- School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Robert Christensen
- From the Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan
| |
Collapse
|
12
|
Preventing inpatient falls with injuries using integrative machine learning prediction: a cohort study. NPJ Digit Med 2019; 2:127. [PMID: 31872067 PMCID: PMC6908660 DOI: 10.1038/s41746-019-0200-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2019] [Accepted: 11/14/2019] [Indexed: 12/13/2022] Open
Abstract
Patient falls during hospitalization can lead to severe injuries and remain one of the most vexing patient-safety problems facing hospitals. They lead to increased medical care costs, lengthened hospital stays, more litigation, and even death. Existing methods and technology to address this problem mostly focus on stratifying inpatients at risk, without predicting fall severity or injuries. Here, a retrospective cohort study was designed and performed to predict the severity of inpatient falls, based on a machine learning classifier integrating multi-view ensemble learning and model-based missing data imputation method. As input, over two thousand inpatient fall patients’ demographic characteristics, diagnoses, procedural data, and bone density measurements were retrieved from the HMH clinical data warehouse from two separate time periods. The predictive classifier developed based on multi-view ensemble learning with missing values (MELMV) outperformed other three baseline models; achieved a cross-validated AUC of 0.713 (95% CI, 0.701–0.725), an AUC of 0.808 (95% CI, 0.740–0.876) on the separate testing set. Our studies show the efficacy of integrative machine-learning based classifier model in dealing with multi-source patient data, which in this case delivers robust predictive performance on the severity of patient falls. The severe fall index provided by the MELMV classifier is calculated to identify inpatients who are at risk of having severe injuries if they fall, thus triggering additional steps of intervention to prevent a harmful fall, beyond the standard-of-care procedure for all high-risk fall patients.
Collapse
|
13
|
Switaj TL, Cummings BM, Logan MS, Mort EA. Adopting RCA 2: The Interrater Reliability of Safety Assessment Codes. Am J Med Qual 2018; 34:152-157. [PMID: 30182723 DOI: 10.1177/1062860618793945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Safety assessment codes (SACs) are one method to evaluate adverse events and determine the need for a root cause analysis. Few facilities currently use SACs, and there is no literature examining their interrater reliability. Two independent raters assigned frequency, actual harm, and potential harm ratings to a sample of patient safety reports. An actual and potential SAC were determined. Percent agreement and Cohen's κ were calculated. Substantial agreement existed for the actual SAC (κ = 0.626, P < .001), fair agreement for the potential SAC (κ = 0.266, P < .001), and low agreement for potential harm (κ = 0.171, P = .002). Although there is subjectivity in all aspects of assigning SACs, the greatest is in potential severity. This presents a problem when using the potential SAC and is in agreement with previous literature showing significant subjectivity in determining potential harm. An operational framework is needed to strengthen reliability.
Collapse
|
14
|
Abstract
Purpose: Clinical provider peer review (CPPR) is a process for evaluating a patient's experience in encounters of care. It is part of ongoing professional practice evaluation and focused professional practice evaluation—important contributors to provider credentialing and privileging. Critical access hospitals are hindered in CPPR by having a limited number of providers, shortages of staff resources, and relationships among staff members that make unbiased review difficult. Small departments within larger institutions may face similar challenges. Methods: A CPPR process created at Mayo Clinic Health System is described. It involved a case review questionnaire built on the Institute of Medicine “Six Aims for Changing the Health Care System,” a standardized intervention algorithm and tracking tool. Outcomes: During 2007 through 2014, a total of 994 cases were reviewed; 31% led to provider dialog and education or intervention. Findings were applied to core measure processes with success rate going from 87% to 97%. Changes were adopted in end-of-life care, contributing to a 50% reduction in all-cause mortality rate. Conclusions: Providing peer review tools to a critical access hospital can keep peer review within a group with knowledge of the individual provider's practice and can make process improvement the everyday work of those involved.
Collapse
|
15
|
Bhise V, Sittig DF, Vaghani V, Wei L, Baldwin J, Singh H. An electronic trigger based on care escalation to identify preventable adverse events in hospitalised patients. BMJ Qual Saf 2017; 27:241-246. [PMID: 28935832 PMCID: PMC5867429 DOI: 10.1136/bmjqs-2017-006975] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 08/10/2017] [Accepted: 08/17/2017] [Indexed: 02/05/2023]
Abstract
Background Methods to identify preventable adverse events typically have low yield and efficiency. We refined the methods of Institute of Healthcare Improvement’s Global Trigger Tool (GTT) application and leveraged electronic health record (EHR) data to improve detection of preventable adverse events, including diagnostic errors. Methods We queried the EHR data repository of a large health system to identify an ‘index hospitalization’ associated with care escalation (defined as transfer to the intensive care unit (ICU) or initiation of rapid response team (RRT) within 15 days of admission) between March 2010 and August 2015. To enrich the record review sample with unexpected events, we used EHR clinical data to modify the GTT algorithm and limited eligible patients to those at lower risk for care escalation based on younger age and presence of minimal comorbid conditions. We modified the GTT review methodology; two physicians independently reviewed eligible ‘e-trigger’ positive records to identify preventable diagnostic and care management events. Results Of 88 428 hospitalisations, 887 were associated with care escalation (712 ICU transfers and 175 RRTs), of which 92 were flagged as trigger-positive and reviewed. Preventable adverse events were detected in 41 cases, yielding a trigger positive predictive value of 44.6% (reviewer agreement 79.35%; Cohen’s kappa 0.573). We identified 7 (7.6%) diagnostic errors and 34 (37.0%) care management-related events: 24 (26.1%) adverse drug events, 4 (4.3%) patient falls, 4 (4.3%) procedure-related complications and 2 (2.2%) hospital-associated infections. In most events (73.1%), there was potential for temporary harm. Conclusion We developed an approach using an EHR data-based trigger and modified review process to efficiently identify hospitalised patients with preventable adverse events, including diagnostic errors. Such e-triggers can help overcome limitations of currently available methods to detect preventable harm in hospitalised patients.
Collapse
Affiliation(s)
- Viraj Bhise
- Center for Innovations in Quality, Effectiveness, and Safety (IQuESt), Michael E DeBakey Veterans Affairs Medical Center, Houston, Texas, USA.,Department of Medicine, Baylor College of Medicine, Houston, Texas, USA
| | - Dean F Sittig
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas, USA
| | - Viralkumar Vaghani
- Center for Innovations in Quality, Effectiveness, and Safety (IQuESt), Michael E DeBakey Veterans Affairs Medical Center, Houston, Texas, USA.,Department of Medicine, Baylor College of Medicine, Houston, Texas, USA
| | - Li Wei
- Center for Innovations in Quality, Effectiveness, and Safety (IQuESt), Michael E DeBakey Veterans Affairs Medical Center, Houston, Texas, USA.,Department of Medicine, Baylor College of Medicine, Houston, Texas, USA
| | - Jessica Baldwin
- Center for Innovations in Quality, Effectiveness, and Safety (IQuESt), Michael E DeBakey Veterans Affairs Medical Center, Houston, Texas, USA.,Department of Medicine, Baylor College of Medicine, Houston, Texas, USA
| | - Hardeep Singh
- Center for Innovations in Quality, Effectiveness, and Safety (IQuESt), Michael E DeBakey Veterans Affairs Medical Center, Houston, Texas, USA.,Department of Medicine, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
16
|
Bhise V, Meyer AND, Singh H, Wei L, Russo E, Al-Mutairi A, Murphy DR. Errors in Diagnosis of Spinal Epidural Abscesses in the Era of Electronic Health Records. Am J Med 2017; 130:975-981. [PMID: 28366427 DOI: 10.1016/j.amjmed.2017.03.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Revised: 01/06/2017] [Accepted: 03/02/2017] [Indexed: 12/20/2022]
Abstract
PURPOSE With this study, we set out to identify missed opportunities in diagnosis of spinal epidural abscesses to outline areas for process improvement. METHODS Using a large national clinical data repository, we identified all patients with a new diagnosis of spinal epidural abscess in the Department of Veterans Affairs (VA) during 2013. Two physicians independently conducted retrospective chart reviews on 250 randomly selected patients and evaluated their records for red flags (eg, unexplained weight loss, neurological deficits, and fever) 90 days prior to diagnosis. Diagnostic errors were defined as missed opportunities to evaluate red flags in a timely or appropriate manner. Reviewers gathered information about process breakdowns related to patient factors, the patient-provider encounter, test performance and interpretation, test follow-up and tracking, and the referral process. Reviewers also determined harm and time lag between red flags and definitive diagnoses. RESULTS Of 250 patients, 119 had a new diagnosis of spinal epidural abscess, 66 (55.5%) of which experienced diagnostic error. Median time to diagnosis in error cases was 12 days, compared with 4 days in cases without error (P <.01). Red flags that were frequently not evaluated in error cases included unexplained fever (n = 57; 86.4%), focal neurological deficits with progressive or disabling symptoms (n = 54; 81.8%), and active infection (n = 54; 81.8%). Most errors involved breakdowns during the patient-provider encounter (n = 60; 90.1%), including failures in information gathering/integration, and were associated with temporary harm (n = 43; 65.2%). CONCLUSION Despite wide availability of clinical data, errors in diagnosis of spinal epidural abscesses are common and involve inadequate history, physical examination, and test ordering. Solutions should include renewed attention to basic clinical skills.
Collapse
Affiliation(s)
- Viraj Bhise
- Houston Veterans Affairs Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center, Houston, Tex; Department of Medicine, Baylor College of Medicine, Houston, Tex
| | - Ashley N D Meyer
- Houston Veterans Affairs Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center, Houston, Tex; Department of Medicine, Baylor College of Medicine, Houston, Tex
| | - Hardeep Singh
- Houston Veterans Affairs Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center, Houston, Tex; Department of Medicine, Baylor College of Medicine, Houston, Tex
| | - Li Wei
- Houston Veterans Affairs Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center, Houston, Tex; Department of Medicine, Baylor College of Medicine, Houston, Tex
| | - Elise Russo
- Houston Veterans Affairs Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center, Houston, Tex; Department of Medicine, Baylor College of Medicine, Houston, Tex
| | - Aymer Al-Mutairi
- Department of Medicine, Baylor College of Medicine, Houston, Tex
| | - Daniel R Murphy
- Houston Veterans Affairs Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey Veterans Affairs Medical Center, Houston, Tex; Department of Medicine, Baylor College of Medicine, Houston, Tex.
| |
Collapse
|
17
|
Grigg EB, Martin LD, Ross FJ, Roesler A, Rampersad SE, Haberkern C, Low DK, Carlin K, Martin LD. Assessing the Impact of the Anesthesia Medication Template on Medication Errors During Anesthesia. Anesth Analg 2017; 124:1617-1625. [DOI: 10.1213/ane.0000000000001823] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
18
|
Abstract
OBJECTIVE The objective of this study was to identify modifiable factors that improve the reliability of ratings of severity of health care-associated harm in clinical practice improvement and research. METHODS A diverse group of clinicians rated 8 types of adverse events: blood product, device or medical/surgical supply, fall, health care-associated infection, medication, perinatal, pressure ulcer, surgery. We used a generalizability theory framework to estimate the impact of number of raters, rater experience, and rater provider type on reliability. RESULTS Pharmacists were slightly more precise and consistent in their ratings than either physicians or nurses. For example, to achieve high reliability of 0.83, 3 physicians could be replaced by 2 pharmacists without loss in precision of measurement. If only 1 rater was available for rating, ∼5% of the reviews for severe harm would have been incorrectly categorized. Reliability was greatly improved with 2 reviewers. CONCLUSIONS We identified factors that influence the reliability of clinician reviews of health care-associated harm. Our novel use of generalizability analyses improved our understanding of how differences affect reliability. This approach was useful in optimizing resource utilization when selecting raters to assess harm and may have similar applications in other settings in health care.
Collapse
|
19
|
Interrater reliability of a near-miss risk index for incident learning systems in radiation oncology. Pract Radiat Oncol 2016; 6:429-435. [DOI: 10.1016/j.prro.2016.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Revised: 03/22/2016] [Accepted: 04/09/2016] [Indexed: 11/22/2022]
|
20
|
Steelman VM, Williams TL, Szekendi MK, Halverson AL, Dintzis SM, Pavkovic S. Surgical Specimen Management: A Descriptive Study of 648 Adverse Events and Near Misses. Arch Pathol Lab Med 2016; 140:1390-1396. [PMID: 27610645 DOI: 10.5858/arpa.2016-0021-oa] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
CONTEXT - Surgical specimen adverse events can lead to delays in treatment or diagnosis, misdiagnosis, reoperation, inappropriate treatment, and anxiety or serious patient harm. OBJECTIVES - To describe the types and frequency of event reports associated with the management of surgical specimens, the contributing factors, and the level of harm associated with these events. DESIGN - A retrospective review was undertaken of surgical specimen adverse events and near misses voluntarily reported in the University HealthSystem Consortium Safety Intelligence Patient Safety Organization database by more than 50 health care facilities during a 3-year period (2011-2013). Event reports that involved surgical specimen management were reviewed for patients undergoing surgery during which tissue or fluid was sent to the pathology department. RESULTS - Six hundred forty-eight surgical specimen events were reported in all stages of the specimen management process, with the most common events reported during the prelaboratory phase and, specifically, with specimen labeling, collection/preservation, and transport. The most common contributing factors were failures in handoff communication, staff inattention, knowledge deficit, and environmental issues. Eight percent of the events (52 of 648) resulted in either the need for additional treatment or temporary or permanent harm to the patient. CONCLUSIONS - All phases of specimen handling and processing are vulnerable to errors. These results provide a starting point for health care organizations to conduct proactive risk analyses of specimen handling procedures and to design safer processes. Particular attention should be paid to effective communication and handoffs, consistent processes across care areas, and staff training. In addition, organizations should consider the use of technology-based identification and tracking systems.
Collapse
Affiliation(s)
- Victoria M Steelman
- From the College of Nursing, University of Iowa, Iowa City (Dr Steelman); the Department of Safety Intelligence Patient Safety Organization (Ms Williams), the Research Institute (Dr Szekendi), and the Safety Intelligence Program (Dr Pavkovic), Vizient, Inc, Chicago, Illinois; the Department of Surgery, Northwestern University Feinberg School of Medicine, Chicago (Dr Halverson); and the Department of Anatomic Pathology, University of Washington Medical Center, Seattle (Dr Dintzis). Vizient, Inc, was formerly known as University HealthSystem Consortium
| | | | | | | | | | | |
Collapse
|
21
|
Rosen AK, Mull HJ. Identifying adverse events after outpatient surgery: improving measurement of patient safety. BMJ Qual Saf 2015; 25:3-5. [DOI: 10.1136/bmjqs-2015-004752] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 09/25/2015] [Indexed: 11/04/2022]
|