1
|
Kernberg A, Gold JA, Mohan V. Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study. J Med Internet Res 2024; 26:e54419. [PMID: 38648636 DOI: 10.2196/54419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/20/2024] [Accepted: 03/10/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND Medical documentation plays a crucial role in clinical practice, facilitating accurate patient management and communication among health care professionals. However, inaccuracies in medical notes can lead to miscommunication and diagnostic errors. Additionally, the demands of documentation contribute to physician burnout. Although intermediaries like medical scribes and speech recognition software have been used to ease this burden, they have limitations in terms of accuracy and addressing provider-specific metrics. The integration of ambient artificial intelligence (AI)-powered solutions offers a promising way to improve documentation while fitting seamlessly into existing workflows. OBJECTIVE This study aims to assess the accuracy and quality of Subjective, Objective, Assessment, and Plan (SOAP) notes generated by ChatGPT-4, an AI model, using established transcripts of History and Physical Examination as the gold standard. We seek to identify potential errors and evaluate the model's performance across different categories. METHODS We conducted simulated patient-provider encounters representing various ambulatory specialties and transcribed the audio files. Key reportable elements were identified, and ChatGPT-4 was used to generate SOAP notes based on these transcripts. Three versions of each note were created and compared to the gold standard via chart review; errors generated from the comparison were categorized as omissions, incorrect information, or additions. We compared the accuracy of data elements across versions, transcript length, and data categories. Additionally, we assessed note quality using the Physician Documentation Quality Instrument (PDQI) scoring system. RESULTS Although ChatGPT-4 consistently generated SOAP-style notes, there were, on average, 23.6 errors per clinical case, with errors of omission (86%) being the most common, followed by addition errors (10.5%) and inclusion of incorrect facts (3.2%). There was significant variance between replicates of the same case, with only 52.9% of data elements reported correctly across all 3 replicates. The accuracy of data elements varied across cases, with the highest accuracy observed in the "Objective" section. Consequently, the measure of note quality, assessed by PDQI, demonstrated intra- and intercase variance. Finally, the accuracy of ChatGPT-4 was inversely correlated to both the transcript length (P=.05) and the number of scorable data elements (P=.05). CONCLUSIONS Our study reveals substantial variability in errors, accuracy, and note quality generated by ChatGPT-4. Errors were not limited to specific sections, and the inconsistency in error types across replicates complicated predictability. Transcript length and data complexity were inversely correlated with note accuracy, raising concerns about the model's effectiveness in handling complex medical cases. The quality and reliability of clinical notes produced by ChatGPT-4 do not meet the standards required for clinical use. Although AI holds promise in health care, caution should be exercised before widespread adoption. Further research is needed to address accuracy, variability, and potential errors. ChatGPT-4, while valuable in various applications, should not be considered a safe alternative to human-generated clinical documentation at this time.
Collapse
Affiliation(s)
- Annessa Kernberg
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Sciences University, Portland, OR, United States
| | - Jeffrey A Gold
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Sciences University, Portland, OR, United States
| | - Vishnu Mohan
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Sciences University, Portland, OR, United States
| |
Collapse
|
2
|
Naqvi WM, Gabr M, Arora SP, Mishra GV, Pashine AA, Quazi Syed Z. Bridging, Mapping, and Addressing Research Gaps in Health Sciences: The Naqvi-Gabr Research Gap Framework. Cureus 2024; 16:e55827. [PMID: 38590484 PMCID: PMC10999783 DOI: 10.7759/cureus.55827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 03/08/2024] [Indexed: 04/10/2024] Open
Abstract
Innovations pertaining to the ever-evolving needs of the medical and healthcare sciences remain constant. This creates a gap between the rationalized needs of the study and the proposed research question. However, classifying, identifying, and addressing these research gaps require a systematic and precise structured map. Using the Medical Subject Heading (MeSH) terms "Research Gaps" AND "Healthcare" AND "Framework" in MEDLINE, Scopus, and CINAHL databases with the filters yielded no relevant literature. Therefore, this review aims to fill this practical and clinical knowledge gap by developing the Naqvi-Gabr Research Gap Framework through critical synthesis based on extensive research on medical and healthcare research gaps. Fourteen research gaps are distributed for allocation as per the healthcare delivery system approach: developing new treatments or prevention strategies, improving diagnostic tools and techniques, addressing health disparities, and improving access to healthcare services. This structured framework determines the strategic mapping of research gaps corresponding to the nature of the research. The identification and classification of the appropriate research gap led to precise and concise conclusions corresponding to the research process proposed in this study. Hence, the Naqvi-Gabr Research Gap Framework is a valuable tool for determining the potential application of gaps by researchers, policymakers, and other stakeholders with a productive address.
Collapse
Affiliation(s)
- Waqar M Naqvi
- Faculty of Interdisciplinary Sciences, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
- Department of Physiotherapy, College of Health Sciences, Gulf Medical University, Ajman, ARE
| | - Mamdouh Gabr
- Department of Physiotherapy, College of Health Sciences, Gulf Medical University, Ajman, ARE
| | - Sakshi P Arora
- Faculty of Interdisciplinary Sciences, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Gaurav V Mishra
- Department of Radiodiagnosis, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| | - Aishwarya A Pashine
- Department of Cardiorespiratory Physiotherapy, Career College Bhopal, Bhopal, IND
| | - Zahiruddin Quazi Syed
- Department of Community Medicine, Jawaharlal Nehru Medical College, Datta Meghe Institute of Higher Education and Research, Wardha, IND
| |
Collapse
|
3
|
Rotenstein LS, Holmgren AJ, Horn DM, Lipsitz S, Phillips R, Gitomer R, Bates DW. System-Level Factors and Time Spent on Electronic Health Records by Primary Care Physicians. JAMA Netw Open 2023; 6:e2344713. [PMID: 37991757 PMCID: PMC10665969 DOI: 10.1001/jamanetworkopen.2023.44713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/13/2023] [Indexed: 11/23/2023] Open
Abstract
Importance Primary care physicians (PCPs) spend the most time on the electronic health record (EHR) of any specialty. Thus, it is critical to understand what factors contribute to varying levels of PCP time spent on EHRs. Objective To characterize variation in EHR time across PCPs and primary care clinics, and to describe how specific PCP, patient panel, clinic, and team collaboration factors are associated with PCPs' time spent on EHRs. Design, Setting, and Participants This cross-sectional study included 307 PCPs practicing across 31 primary care clinics at Massachusetts General Hospital and Brigham and Women's Hospital during 2021. Data were analyzed from October 2022 to October 2023. Main Outcomes and Measures Total per-visit EHR time, total per-visit pajama time (ie, time spent on the EHR between 5:30 pm to 7:00 am and on weekends), and total per-visit time on the electronic inbox as measured by activity log data derived from an EHR database. Results The sample included 307 PCPs (183 [59.6%] female). On a per-visit basis, PCPs spent a median (IQR) of 36.2 (28.9-45.7) total minutes on the EHR, 6.2 (3.1-11.5) minutes of pajama time, and 7.8 (5.5-10.7) minutes on the electronic inbox. When comparing PCP time expenditure by clinic, median (IQR) total EHR time, median (IQR) pajama time, and median (IQR) electronic inbox time ranged from 23.5 (20.7-53.1) to 47.9 (30.6-70.7) minutes per visit, 1.7 (0.7-10.5) to 13.1 (7.7-28.2) minutes per visit, and 4.7 (4.1-5.2) to 10.8 (8.9-15.2) minutes per visit, respectively. In a multivariable model with an outcome of total per-visit EHR time per visit, an above median percentage of teamwork on orders was associated with 3.81 (95% CI, 0.49-7.13) minutes per visit fewer and having a clinic pharmacy technician was associated with 7.87 (95% CI, 2.03-13.72) minutes per visit fewer. Practicing in a community health center was associated with fewer minutes of total EHR time per visit (5.40 [95% CI, 0.06-10.74] minutes). Conclusions and Relevance There is substantial variation in EHR time among individual PCPs and PCPs within clinics. Organization-level factors, such as team collaboration on orders, support for medication refill functions, and practicing in a community health center, are associated with lower EHR time for PCPs. These findings highlight the importance of addressing EHR burden at a systems level.
Collapse
Affiliation(s)
- Lisa S. Rotenstein
- Brigham and Women’s Hospital, Boston, Massachusetts
- University of California at San Francisco
| | | | - Daniel M. Horn
- Harvard Medical School, Boston, Massachusetts
- Massachusetts General Hospital, Boston
| | - Stuart Lipsitz
- Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Russell Phillips
- Harvard Medical School, Boston, Massachusetts
- Harvard Center for Primary Care, Boston, Massachusetts
| | - Richard Gitomer
- Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - David W. Bates
- Brigham and Women’s Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
4
|
Palani S, Saeed I, Legler A, Sadej I, MacDonald C, Kirsh SR, Pizer SD, Shafer PR. Effect of a National VHA Medical Scribe Pilot on Provider Productivity, Wait Times, and Patient Satisfaction in Cardiology and Orthopedics. J Gen Intern Med 2023:10.1007/s11606-023-08114-6. [PMID: 37340268 DOI: 10.1007/s11606-023-08114-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 02/23/2023] [Indexed: 06/22/2023]
Abstract
BACKGROUND Section 507 of the VA MISSION Act of 2018 mandated a 2-year pilot study of medical scribes in the Veterans Health Administration (VHA), with 12 VA Medical Centers randomly selected to receive scribes in their emergency departments or high wait time specialty clinics (cardiology and orthopedics). The pilot began on June 30, 2020, and ended on July 1, 2022. OBJECTIVE Our objective was to evaluate the impact of medical scribes on provider productivity, wait times, and patient satisfaction in cardiology and orthopedics, as mandated by the MISSION Act. DESIGN Cluster randomized trial, with intent-to-treat analysis using difference-in-differences regression. PATIENTS Veterans using 18 included VA Medical Centers (12 intervention and 6 comparison sites). INTERVENTION Randomization into MISSION 507 medical scribe pilot. MAIN MEASURES Provider productivity, wait times, and patient satisfaction per clinic-pay period. KEY RESULTS Randomization into the scribe pilot was associated with increases of 25.2 relative value units (RVUs) per full-time equivalent (FTE) (p < 0.001) and 8.5 visits per FTE (p = 0.002) in cardiology and increases of 17.3 RVUs per FTE (p = 0.001) and 12.5 visits per FTE (p = 0.001) in orthopedics. We found that the scribe pilot was associated with a decrease of 8.5 days in request to appointment day wait times (p < 0.001) in orthopedics, driven by a 5.7-day decrease in appointment made to appointment day wait times (p < 0.001), and observed no change in wait times in cardiology. We also observed no declines in patient satisfaction with randomization into the scribe pilot. CONCLUSIONS Given the potential improvements in productivity and wait times with no change in patient satisfaction, our results suggest that scribes may be a useful tool to improve access to VHA care. However, participation in the pilot by sites and providers was voluntary, which could have implications for scalability and what effects could be expected if scribes were introduced to the care process without buy-in. Cost was not considered in this analysis but is an important factor for future implementation. TRIAL REGISTRATION ClinicalTrials.gov Identifier: NCT04154462.
Collapse
Affiliation(s)
- Sivagaminathan Palani
- Partnered Evidence-Based Policy Resource Center, VA Boston Healthcare System, Boston, MA, USA
- Department of Health Law, Policy, and Management, Boston University, Boston, MA, USA
| | - Iman Saeed
- Partnered Evidence-Based Policy Resource Center, VA Boston Healthcare System, Boston, MA, USA
- Department of Health Law, Policy, and Management, Boston University, Boston, MA, USA
| | - Aaron Legler
- Partnered Evidence-Based Policy Resource Center, VA Boston Healthcare System, Boston, MA, USA
| | - Izabela Sadej
- Partnered Evidence-Based Policy Resource Center, VA Boston Healthcare System, Boston, MA, USA
| | - Carol MacDonald
- Partnered Evidence-Based Policy Resource Center, VA Boston Healthcare System, Boston, MA, USA
| | - Susan R Kirsh
- Veterans Health Administration, Department of Veterans Affairs, DC, Washington, USA
| | - Steven D Pizer
- Partnered Evidence-Based Policy Resource Center, VA Boston Healthcare System, Boston, MA, USA
- Department of Health Law, Policy, and Management, Boston University, Boston, MA, USA
| | - Paul R Shafer
- Partnered Evidence-Based Policy Resource Center, VA Boston Healthcare System, Boston, MA, USA.
- Department of Health Law, Policy, and Management, Boston University, Boston, MA, USA.
| |
Collapse
|
5
|
Apathy NC, Rotenstein L, Bates DW, Holmgren AJ. Documentation dynamics: Note composition, burden, and physician efficiency. Health Serv Res 2023; 58:674-685. [PMID: 36342001 PMCID: PMC10154172 DOI: 10.1111/1475-6773.14097] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVE To analyze how physician clinical note length and composition relate to electronic health record (EHR)-based measures of burden and efficiency that have been tied to burnout. DATA SOURCES AND STUDY SETTING Secondary EHR use metadata capturing physician-level measures from 203,728 US-based ambulatory physicians using the Epic Systems EHR between September 2020 and May 2021. STUDY DESIGN In this cross-sectional study, we analyzed physician clinical note length and note composition (e.g., content from manual or templated text). Our primary outcomes were three time-based measures of EHR burden (time writing EHR notes, time in the EHR after-hours, and EHR time on unscheduled days), and one measure of efficiency (percent of visits closed in the same day). We used multivariate regression to estimate the relationship between our outcomes and note length and composition. DATA EXTRACTION Physician-week measures of EHR usage were extracted from Epic's Signal platform used for measuring provider EHR efficiency. We calculated physician-level averages for our measures of interest and assigned physicians to overall note length deciles and note composition deciles from six sources, including templated text, manual text, and copy/paste text. PRINCIPAL FINDINGS Physicians in the top decile of note length demonstrated greater burden and lower efficiency than the median physician, spending 39% more time in the EHR after hours (p < 0.001) and closing 5.6 percentage points fewer visits on the same day (p < 0.001). Copy/paste demonstrated a similar dose/response relationship, with top-decile copy/paste users closing 6.8 percentage points fewer visits on the same day (p < 0.001) and spending more time in the EHR after hours and on days off (both p < 0.001). Templated text (e.g., Epic's SmartTools) demonstrated a non-linear relationship with burden and efficiency, with very low and very high levels of use associated with increased EHR burden and decreased efficiency. CONCLUSIONS "Efficiency tools" like copy/paste and templated text meant to reduce documentation burden and increase provider efficiency may have limited efficacy.
Collapse
Affiliation(s)
- Nate C. Apathy
- National Center for Human Factors in HealthcareMedStar Health Research InstituteWashingtonDistrict of ColumbiaUSA
- Center for Biomedical InformaticsRegenstrief InstituteIndianapolisIndianaUSA
| | - Lisa Rotenstein
- Harvard Medical SchoolBostonMassachusettsUSA
- Population Health Brigham & Women's HospitalBostonMassachusettsUSA
| | - David W. Bates
- Harvard Medical SchoolBostonMassachusettsUSA
- Division of General Internal MedicineBrigham & Women's HospitalBostonMassachusettsUSA
- Present address:
Department of Health Policy and ManagementHarvard School of Public HealthBostonMAUSA
| | - A. Jay Holmgren
- Center for Clinical Informatics and Improvement Research, University of California – San Francisco School of MedicineSan FranciscoCaliforniaUSA
| |
Collapse
|
6
|
Katta R, Strouphauer E, Ibraheim MK, Li-Wang J, Dao H. Practice Efficiency in Dermatology: Enhancing Quality of Care and Physician Well-Being. Cureus 2023; 15:e39195. [PMID: 37378213 PMCID: PMC10292050 DOI: 10.7759/cureus.39195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/18/2023] [Indexed: 06/29/2023] Open
Abstract
A focus on improved efficiency can impact both patient care and physician well-being. Efficiency is one of the six domains of healthcare quality. It is also recognized as one of the three main pillars of professional fulfillment. Quality improvement measures in the area of efficiency are focused on reducing waste, specifically related to physicians' time, energy, and cognitive demands. Interventions and practices reported in the literature or communicated by dermatologists have documented efforts centered on patient care workflows, documentation, communication, and other areas. Team-based care models maximize the skill sets of other trained providers, while workflow changes encompassing process standardization, communication, and task automatization have improved patient safety and efficiency. Strategies to promote documentation efficiency have centered on eliminating extraneous documentation alongside the use of templates, text expander functionality, and dictation tools. The use of in-office or virtual scribes, when provided with adequate training and consistent feedback, has improved charting time, accuracy, and physician satisfaction. Although upfront investments in time and financial resources may be required, quality improvement in efficiency can benefit healthcare quality, patient safety, and physician satisfaction.
Collapse
Affiliation(s)
- Rajani Katta
- Internal Medicine, Baylor College of Medicine, Houston, USA
- Dermatology, University of Texas Health Science Center at Houston, Houston, USA
| | | | | | | | - Harry Dao
- Dermatology, Loma Linda University Health, Loma Linda, USA
| |
Collapse
|
7
|
How Providers Can Optimize Effective and Safe Scribe Use: a Qualitative Study. J Gen Intern Med 2022:10.1007/s11606-022-07942-2. [PMID: 36385408 PMCID: PMC9668220 DOI: 10.1007/s11606-022-07942-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 11/04/2022] [Indexed: 11/17/2022]
Abstract
BACKGROUND The use of electronic health records has generated an increase in after-hours and weekend work for providers. To alleviate this situation, the hiring of medical scribes has rapidly increased. Given the lack of scribe industry standards and the wide variance in how providers and scribes work together, it could potentially create new patient safety-related risks. OBJECTIVE The purpose of this paper was to identify how providers can optimize the effective and safe use of scribes. DESIGN The research team conducted a secondary analysis of qualitative data where we reanalyzed data from interview transcripts, field notes, and transcribed group discussions generated by four previous projects related to medical scribes. PARTICIPANTS Purposively selected participants included subject matter experts, providers, informaticians, medical scribes, medical assistants, administrators, social scientists, medical students, and qualitative researchers. APPROACH The team used NVivo12 to assist with the qualitative analysis. We used a template method followed by word queries to identify an optimum level of scribe utilization. We then used an inductive interpretive theme-generation process. KEY RESULTS We identified three themes: (1) communication aspects, (2) teamwork efforts, and (3) provider characteristics. Each theme contained specific practices so providers can use scribes safely and in a standardized way. CONCLUSION We utilized a secondary qualitative data analysis methodology to develop themes describing how providers can optimize their use of scribes. This new knowledge could increase provider efficiency and safety and be incorporated into further and future training tools for them.
Collapse
|