Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Liu S, Wright AP, Patterson BL, Wanderer JP, Turer RW, Nelson SD, McCoy AB, Sittig DF, Wright A. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J Am Med Inform Assoc 2023;30:1237-1245. [PMID: 37087108 PMCID: PMC10280357 DOI: 10.1093/jamia/ocad072] [Citation(s) in RCA: 93] [Impact Index Per Article: 93.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/28/2023] [Accepted: 04/11/2023] [Indexed: 04/24/2023] Open

For:	Liu S, Wright AP, Patterson BL, Wanderer JP, Turer RW, Nelson SD, McCoy AB, Sittig DF, Wright A. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J Am Med Inform Assoc 2023;30:1237-1245. [PMID: 37087108 PMCID: PMC10280357 DOI: 10.1093/jamia/ocad072] [Citation(s) in RCA: 93] [Impact Index Per Article: 93.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/28/2023] [Accepted: 04/11/2023] [Indexed: 04/24/2023] Open

Number

Cited by Other Article(s)

Rodriguez DV, Lawrence K, Gonzalez J, Brandfield-Harvey B, Xu L, Tasneem S, Levine DL, Mann D. Leveraging Generative AI Tools to Support the Development of Digital Solutions in Health Care Research: Case Study. JMIR Hum Factors 2024;11:e52885. [PMID: 38446539 PMCID: PMC10955400 DOI: 10.2196/52885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/27/2023] [Accepted: 12/15/2023] [Indexed: 03/07/2024] Open

Abstract

BACKGROUND

Generative artificial intelligence has the potential to revolutionize health technology product development by improving coding quality, efficiency, documentation, quality assessment and review, and troubleshooting.

OBJECTIVE

This paper explores the application of a commercially available generative artificial intelligence tool (ChatGPT) to the development of a digital health behavior change intervention designed to support patient engagement in a commercial digital diabetes prevention program.

METHODS

We examined the capacity, advantages, and limitations of ChatGPT to support digital product idea conceptualization, intervention content development, and the software engineering process, including software requirement generation, software design, and code production. In total, 11 evaluators, each with at least 10 years of experience in fields of study ranging from medicine and implementation science to computer science, participated in the output review process (ChatGPT vs human-generated output). All had familiarity or prior exposure to the original personalized automatic messaging system intervention. The evaluators rated the ChatGPT-produced outputs in terms of understandability, usability, novelty, relevance, completeness, and efficiency.

RESULTS

Most metrics received positive scores. We identified that ChatGPT can (1) support developers to achieve high-quality products faster and (2) facilitate nontechnical communication and system understanding between technical and nontechnical team members around the development goal of rapid and easy-to-build computational solutions for medical technologies.

CONCLUSIONS

ChatGPT can serve as a usable facilitator for researchers engaging in the software development life cycle, from product conceptualization to feature identification and user story development to code generation.

TRIAL REGISTRATION

ClinicalTrials.gov NCT04049500; https://clinicaltrials.gov/ct2/show/NCT04049500.

Collapse

Mu Y, He D. The Potential Applications and Challenges of ChatGPT in the Medical Field. Int J Gen Med 2024;17:817-826. [PMID: 38476626 PMCID: PMC10929156 DOI: 10.2147/ijgm.s456659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open

Spotnitz M, Idnay B, Gordon ER, Shyu R, Zhang G, Liu C, Cimino JJ, Weng C. A Survey of Clinicians' Views of the Utility of Large Language Models. Appl Clin Inform 2024;15:306-312. [PMID: 38442909 PMCID: PMC11023712 DOI: 10.1055/a-2281-7092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 02/15/2024] [Indexed: 03/07/2024] Open

Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform 2024;151:104620. [PMID: 38462064 DOI: 10.1016/j.jbi.2024.104620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/27/2024] [Accepted: 02/29/2024] [Indexed: 03/12/2024]

Abstract

OBJECTIVE

Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in answering medical questions and provide direction for future research.

METHODS

An extensive literature search was conducted on June 15, 2023, across ten medical databases. The keyword used was "ChatGPT," without restrictions on publication type, language, or date. Studies evaluating ChatGPT's performance in answering medical questions were included. Exclusions comprised review articles, comments, patents, non-medical evaluations of ChatGPT, and preprint studies. Data was extracted on general study characteristics, question sources, conversation processes, assessment metrics, and performance of ChatGPT. An evaluation framework for LLM in medical inquiries was proposed by integrating insights from selected literature. This study is registered with PROSPERO, CRD42023456327.

RESULTS

A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. ChatGPT displayed an overall integrated accuracy of 56 % (95 % CI: 51 %-60 %, I2 = 87 %) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. As per our proposed evaluation framework, many studies failed to report methodological details, such as the date of inquiry, version of ChatGPT, and inter-rater consistency.

CONCLUSION

This review reveals ChatGPT's potential in addressing medical inquiries, but the heterogeneity of the study design and insufficient reporting might affect the results' reliability. Our proposed evaluation framework provides insights for the future study design and transparent reporting of LLM in responding to medical questions.

Collapse

Bužančić I, Belec D, Držaić M, Kummer I, Brkić J, Fialová D, Ortner Hadžiabdić M. Clinical decision-making in benzodiazepine deprescribing by healthcare providers vs. AI-assisted approach. Br J Clin Pharmacol 2024;90:662-674. [PMID: 37949663 DOI: 10.1111/bcp.15963] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 10/26/2023] [Accepted: 10/29/2023] [Indexed: 11/12/2023] Open

Abstract

AIMS

The aim of this study was to compare the clinical decision-making for benzodiazepine deprescribing between a healthcare provider (HCP) and an artificial intelligence (AI) chatbot GPT4 (ChatGPT-4).

METHODS

We analysed real-world data from a Croatian cohort of community-dwelling benzodiazepine patients (n = 154) within the EuroAgeism H2020 ESR 7 project. HCPs evaluated the data using pre-established deprescribing criteria to assess benzodiazepine discontinuation potential. The research team devised and tested AI prompts to ensure consistency with HCP judgements. An independent researcher employed ChatGPT-4 with predetermined prompts to simulate clinical decisions for each patient case. Data derived from human-HCP and ChatGPT-4 decisions were compared for agreement rates and Cohen's kappa.

RESULTS

Both HPC and ChatGPT identified patients for benzodiazepine deprescribing (96.1% and 89.6%, respectively), showing an agreement rate of 95% (κ = .200, P = .012). Agreement on four deprescribing criteria ranged from 74.7% to 91.3% (lack of indication κ = .352, P < .001; prolonged use κ = .088, P = .280; safety concerns κ = .123, P = .006; incorrect dosage κ = .264, P = .001). Important limitations of GPT-4 responses were identified, including 22.1% ambiguous outputs, generic answers and inaccuracies, posing inappropriate decision-making risks.

CONCLUSIONS

While AI-HCP agreement is substantial, sole AI reliance poses a risk for unsuitable clinical decision-making. This study's findings reveal both strengths and areas for enhancement of ChatGPT-4 in the deprescribing recommendations within a real-world sample. Our study underscores the need for additional research on chatbot functionality in patient therapy decision-making, further fostering the advancement of AI for optimal performance.

Collapse

Ge J, Buenaventura A, Berrean B, Purvis J, Fontil V, Lai JC, Pletcher MJ. Applying human-centered design to the construction of a cirrhosis management clinical decision support system. Hepatol Commun 2024;8:e0394. [PMID: 38407255 PMCID: PMC10898661 DOI: 10.1097/hc9.0000000000000394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 12/13/2023] [Indexed: 02/27/2024] Open

Abstract

BACKGROUND

Electronic health record (EHR)-based clinical decision support is a scalable way to help standardize clinical care. Clinical decision support systems have not been extensively investigated in cirrhosis management. Human-centered design (HCD) is an approach that engages with potential users in intervention development. In this study, we applied HCD to design the features and interface for a clinical decision support system for cirrhosis management, called CirrhosisRx.

METHODS

We conducted technical feasibility assessments to construct a visual blueprint that outlines the basic features of the interface. We then convened collaborative-design workshops with generalist and specialist clinicians. We elicited current workflows for cirrhosis management, assessed gaps in existing EHR systems, evaluated potential features, and refined the design prototype for CirrhosisRx. At the conclusion of each workshop, we analyzed recordings and transcripts.

RESULTS

Workshop feedback showed that the aggregation of relevant clinical data into 6 cirrhosis decompensation domains (defined as common inpatient clinical scenarios) was the most important feature. Automatic inference of clinical events from EHR data, such as gastrointestinal bleeding from hemoglobin changes, was not accepted due to accuracy concerns. Visualizations for risk stratification scores were deemed not necessary. Lastly, the HCD co-design workshops allowed us to identify the target user population (generalists).

CONCLUSIONS

This is one of the first applications of HCD to design the features and interface for an electronic intervention for cirrhosis management. The HCD process altered features, modified the design interface, and likely improved CirrhosisRx's overall usability. The finalized design for CirrhosisRx proceeded to development and production and will be tested for effectiveness in a pragmatic randomized controlled trial. This work provides a model for the creation of other EHR-based interventions in hepatology care.

Collapse

Chandra A, Chakraborty A. Exploring the role of large language models in radiation emergency response. JOURNAL OF RADIOLOGICAL PROTECTION : OFFICIAL JOURNAL OF THE SOCIETY FOR RADIOLOGICAL PROTECTION 2024;44:011510. [PMID: 38324900 DOI: 10.1088/1361-6498/ad270c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 02/07/2024] [Indexed: 02/09/2024]

Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024;44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open

Stengel FC, Stienen MN, Ivanov M, Gandía-González ML, Raffa G, Ganau M, Whitfield P, Motov S. Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues. BRAIN & SPINE 2024;4:102765. [PMID: 38510593 PMCID: PMC10951784 DOI: 10.1016/j.bas.2024.102765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/28/2024] [Accepted: 02/12/2024] [Indexed: 03/22/2024]

Abdullahi T, Singh R, Eickhoff C. Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models. JMIR MEDICAL EDUCATION 2024;10:e51391. [PMID: 38349725 PMCID: PMC10900078 DOI: 10.2196/51391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 11/07/2023] [Accepted: 12/11/2023] [Indexed: 02/15/2024]

Abstract

BACKGROUND

Patients with rare and complex diseases often experience delayed diagnoses and misdiagnoses because comprehensive knowledge about these diseases is limited to only a few medical experts. In this context, large language models (LLMs) have emerged as powerful knowledge aggregation tools with applications in clinical decision support and education domains.

OBJECTIVE

This study aims to explore the potential of 3 popular LLMs, namely Bard (Google LLC), ChatGPT-3.5 (OpenAI), and GPT-4 (OpenAI), in medical education to enhance the diagnosis of rare and complex diseases while investigating the impact of prompt engineering on their performance.

METHODS

We conducted experiments on publicly available complex and rare cases to achieve these objectives. We implemented various prompt strategies to evaluate the performance of these models using both open-ended and multiple-choice prompts. In addition, we used a majority voting strategy to leverage diverse reasoning paths within language models, aiming to enhance their reliability. Furthermore, we compared their performance with the performance of human respondents and MedAlpaca, a generative LLM specifically designed for medical tasks.

RESULTS

Notably, all LLMs outperformed the average human consensus and MedAlpaca, with a minimum margin of 5% and 13%, respectively, across all 30 cases from the diagnostic case challenge collection. On the frequently misdiagnosed cases category, Bard tied with MedAlpaca but surpassed the human average consensus by 14%, whereas GPT-4 and ChatGPT-3.5 outperformed MedAlpaca and the human respondents on the moderately often misdiagnosed cases category with minimum accuracy scores of 28% and 11%, respectively. The majority voting strategy, particularly with GPT-4, demonstrated the highest overall score across all cases from the diagnostic complex case collection, surpassing that of other LLMs. On the Medical Information Mart for Intensive Care-III data sets, Bard and GPT-4 achieved the highest diagnostic accuracy scores, with multiple-choice prompts scoring 93%, whereas ChatGPT-3.5 and MedAlpaca scored 73% and 47%, respectively. Furthermore, our results demonstrate that there is no one-size-fits-all prompting approach for improving the performance of LLMs and that a single strategy does not universally apply to all LLMs.

CONCLUSIONS

Our findings shed light on the diagnostic capabilities of LLMs and the challenges associated with identifying an optimal prompting strategy that aligns with each language model's characteristics and specific task requirements. The significance of prompt engineering is highlighted, providing valuable insights for researchers and practitioners who use these language models for medical training. Furthermore, this study represents a crucial step toward understanding how LLMs can enhance diagnostic reasoning in rare and complex medical cases, paving the way for developing effective educational tools and accurate diagnostic aids to improve patient care and outcomes.

Collapse

Wang Z, Zhang Z, Traverso A, Dekker A, Qian L, Sun P. Assessing the role of GPT-4 in thyroid ultrasound diagnosis and treatment recommendations: enhancing interpretability with a chain of thought approach. Quant Imaging Med Surg 2024;14:1602-1615. [PMID: 38415150 PMCID: PMC10895085 DOI: 10.21037/qims-23-1180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 11/30/2023] [Indexed: 02/29/2024]

Segal S, Saha AK, Khanna AK. Appropriateness of Answers to Common Preanesthesia Patient Questions Composed by the Large Language Model GPT-4 Compared to Human Authors. Anesthesiology 2024;140:333-335. [PMID: 38193737 DOI: 10.1097/aln.0000000000004824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]

Liao Z, Wang J, Shi Z, Lu L, Tabata H. Revolutionary Potential of ChatGPT in Constructing Intelligent Clinical Decision Support Systems. Ann Biomed Eng 2024;52:125-129. [PMID: 37332008 DOI: 10.1007/s10439-023-03288-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 06/13/2023] [Indexed: 06/20/2023]

Jin X, Frock A, Nagaraja S, Wallqvist A, Reifman J. AI algorithm for personalized resource allocation and treatment of hemorrhage casualties. Front Physiol 2024;15:1327948. [PMID: 38332989 PMCID: PMC10851938 DOI: 10.3389/fphys.2024.1327948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 01/09/2024] [Indexed: 02/10/2024] Open

Liu X, Wu J, Shao A, Shen W, Ye P, Wang Y, Ye J, Jin K, Yang J. Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study. J Med Internet Res 2024;26:e51926. [PMID: 38252483 PMCID: PMC10845019 DOI: 10.2196/51926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 10/07/2023] [Accepted: 11/30/2023] [Indexed: 01/23/2024] Open

Abstract

BACKGROUND

Benefiting from rich knowledge and the exceptional ability to understand text, large language models like ChatGPT have shown great potential in English clinical environments. However, the performance of ChatGPT in non-English clinical settings, as well as its reasoning, have not been explored in depth.

OBJECTIVE

This study aimed to evaluate ChatGPT's diagnostic performance and inference abilities for retinal vascular diseases in a non-English clinical environment.

METHODS

In this cross-sectional study, we collected 1226 fundus fluorescein angiography reports and corresponding diagnoses written in Chinese and tested ChatGPT with 4 prompting strategies (direct diagnosis or diagnosis with a step-by-step reasoning process and in Chinese or English).

RESULTS

Compared with ChatGPT using Chinese prompts for direct diagnosis that achieved an F1-score of 70.47%, ChatGPT using English prompts for direct diagnosis achieved the best diagnostic performance (80.05%), which was inferior to ophthalmologists (89.35%) but close to ophthalmologist interns (82.69%). As for its inference abilities, although ChatGPT can derive a reasoning process with a low error rate (0.4 per report) for both Chinese and English prompts, ophthalmologists identified that the latter brought more reasoning steps with less incompleteness (44.31%), misinformation (1.96%), and hallucinations (0.59%) (all P<.001). Also, analysis of the robustness of ChatGPT with different language prompts indicated significant differences in the recall (P=.03) and F1-score (P=.04) between Chinese and English prompts. In short, when prompted in English, ChatGPT exhibited enhanced diagnostic and inference capabilities for retinal vascular disease classification based on Chinese fundus fluorescein angiography reports.

CONCLUSIONS

ChatGPT can serve as a helpful medical assistant to provide diagnosis in non-English clinical environments, but there are still performance gaps, language disparities, and errors compared to professionals, which demonstrate the potential limitations and the need to continually explore more robust large language models in ophthalmology practice.

Collapse

dos Santos ML, Victória VNG. Critical evaluation of applications of artificial intelligence based linguistic models in Occupational Health. Rev Bras Med Trab 2024;22:e20231241. [PMID: 39165532 PMCID: PMC11333049 DOI: 10.47626/1679-4435-2023-1241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 10/30/2023] [Indexed: 08/22/2024] Open

Harrington L. ChatGPT Is Trending: Trust but Verify. AACN Adv Crit Care 2023;34:280-286. [PMID: 37619604 DOI: 10.4037/aacnacc2023129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/26/2023]

Zawiah M, Al-Ashwal FY, Gharaibeh L, Abu Farha R, Alzoubi KH, Abu Hammour K, Qasim QA, Abrah F. ChatGPT and Clinical Training: Perception, Concerns, and Practice of Pharm-D Students. J Multidiscip Healthc 2023;16:4099-4110. [PMID: 38116306 PMCID: PMC10729768 DOI: 10.2147/jmdh.s439223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 12/04/2023] [Indexed: 12/21/2023] Open

Abstract

Background

The emergence of Chat-Generative Pre-trained Transformer (ChatGPT) by OpenAI has revolutionized AI technology, demonstrating significant potential in healthcare and pharmaceutical education, yet its real-world applicability in clinical training warrants further investigation.

Methods

A cross-sectional study was conducted between April and May 2023 to assess PharmD students' perceptions, concerns, and experiences regarding the integration of ChatGPT into clinical pharmacy education. The study utilized a convenient sampling method through online platforms and involved a questionnaire with sections on demographics, perceived benefits, concerns, and experience with ChatGPT. Statistical analysis was performed using SPSS, including descriptive and inferential analyses.

Results

The findings of the study involving 211 PharmD students revealed that the majority of participants were male (77.3%), and had prior experience with artificial intelligence (68.2%). Over two-thirds were aware of ChatGPT. Most students (n= 139, 65.9%) perceived potential benefits in using ChatGPT for various clinical tasks, with concerns including over-reliance, accuracy, and ethical considerations. Adoption of ChatGPT in clinical training varied, with some students not using it at all, while others utilized it for tasks like evaluating drug-drug interactions and developing care plans. Previous users tended to have higher perceived benefits and lower concerns, but the differences were not statistically significant.

Conclusion

Utilizing ChatGPT in clinical training offers opportunities, but students' lack of trust in it for clinical decisions highlights the need for collaborative human-ChatGPT decision-making. It should complement healthcare professionals' expertise and be used strategically to compensate for human limitations. Further research is essential to optimize ChatGPT's effective integration.

Collapse

Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Qureshi F, Cheungpasitporn W. Innovating Personalized Nephrology Care: Exploring the Potential Utilization of ChatGPT. J Pers Med 2023;13:1681. [PMID: 38138908 PMCID: PMC10744377 DOI: 10.3390/jpm13121681] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/02/2023] [Accepted: 12/02/2023] [Indexed: 12/24/2023] Open

Wang G, Liu Q, Chen G, Xia B, Zeng D, Chen G, Guo C. AI's deep dive into complex pediatric inguinal hernia issues: a challenge to traditional guidelines? Hernia 2023;27:1587-1599. [PMID: 37843604 DOI: 10.1007/s10029-023-02900-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/19/2023] [Indexed: 10/17/2023]

Affiliation(s)

G Wang Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China Department of Pediatrics, Children's Hospital, Chongqing Medical University, Chongqing, People's Republic of China Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China
Q Liu Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
G Chen Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
B Xia Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
D Zeng Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China
G Chen Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China. Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China. Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China. Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Women and Children's Hospital of Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China.
C Guo Department of Pediatrics, Women's and Children's Hospital, Chongqing Medical University, 120 Longshan Rd., Chongqing, 401147, People's Republic of China. Department of Fetus and Pediatrics, Chongqing Health Center for Women and Children, Chongqing, People's Republic of China. Department of Pediatric General Surgery, Chongqing Maternal and Child Health Hospital, Chongqing Medical University, Chongqing, People's Republic of China.

Collapse

Levartovsky A, Ben-Horin S, Kopylov U, Klang E, Barash Y. Towards AI-Augmented Clinical Decision-Making: An Examination of ChatGPT's Utility in Acute Ulcerative Colitis Presentations. Am J Gastroenterol 2023;118:2283-2289. [PMID: 37611254 DOI: 10.14309/ajg.0000000000002483] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 08/04/2023] [Indexed: 08/25/2023]

Benary M, Wang XD, Schmidt M, Soll D, Hilfenhaus G, Nassir M, Sigler C, Knödler M, Keller U, Beule D, Keilholz U, Leser U, Rieke DT. Leveraging Large Language Models for Decision Support in Personalized Oncology. JAMA Netw Open 2023;6:e2343689. [PMID: 37976064 PMCID: PMC10656647 DOI: 10.1001/jamanetworkopen.2023.43689] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 10/04/2023] [Indexed: 11/19/2023] Open

Abstract

Importance

Clinical interpretation of complex biomarkers for precision oncology currently requires manual investigations of previous studies and databases. Conversational large language models (LLMs) might be beneficial as automated tools for assisting clinical decision-making.

Objective

To assess performance and define their role using 4 recent LLMs as support tools for precision oncology.

Design, Setting, and Participants

This diagnostic study examined 10 fictional cases of patients with advanced cancer with genetic alterations. Each case was submitted to 4 different LLMs (ChatGPT, Galactica, Perplexity, and BioMedLM) and 1 expert physician to identify personalized treatment options in 2023. Treatment options were masked and presented to a molecular tumor board (MTB), whose members rated the likelihood of a treatment option coming from an LLM on a scale from 0 to 10 (0, extremely unlikely; 10, extremely likely) and decided whether the treatment option was clinically useful.

Main Outcomes and Measures

Number of treatment options, precision, recall, F1 score of LLMs compared with human experts, recognizability, and usefulness of recommendations.

Results

For 10 fictional cancer patients (4 with lung cancer, 6 with other; median [IQR] 3.5 [3.0-4.8] molecular alterations per patient), a median (IQR) number of 4.0 (4.0-4.0) compared with 3.0 (3.0-5.0), 7.5 (4.3-9.8), 11.5 (7.8-13.0), and 13.0 (11.3-21.5) treatment options each was identified by the human expert and 4 LLMs, respectively. When considering the expert as a criterion standard, LLM-proposed treatment options reached F1 scores of 0.04, 0.17, 0.14, and 0.19 across all patients combined. Combining treatment options from different LLMs allowed a precision of 0.29 and a recall of 0.29 for an F1 score of 0.29. LLM-generated treatment options were recognized as AI-generated with a median (IQR) 7.5 (5.3-9.0) points in contrast to 2.0 (1.0-3.0) points for manually annotated cases. A crucial reason for identifying AI-generated treatment options was insufficient accompanying evidence. For each patient, at least 1 LLM generated a treatment option that was considered helpful by MTB members. Two unique useful treatment options (including 1 unique treatment strategy) were identified only by LLM.

Conclusions and Relevance

In this diagnostic study, treatment options of LLMs in precision oncology did not reach the quality and credibility of human experts; however, they generated helpful ideas that might have complemented established procedures. Considering technological progress, LLMs could play an increasingly important role in assisting with screening and selecting relevant biomedical literature to support evidence-based, personalized treatment decisions.

Collapse

Affiliation(s)

Manuela Benary Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Core Unit Bioinformatics, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany
Xing David Wang Knowledge Management in Bioinformatics, Humboldt-Universität zu Berlin, Berlin, Germany
Max Schmidt Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Dominik Soll Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Department of Endocrinology and Metabolic Diseases, Charité Universitätsmedizin Berlin, Department of Endocrinology and Metabolic Diseases, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Georg Hilfenhaus Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Department of Hematology, Oncology and Cancer Immunology, Campus Charité Mitte, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Mani Nassir Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Department of Hematology, Oncology and Cancer Immunology, Campus Charité Mitte, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Christian Sigler Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Maren Knödler Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Ulrich Keller Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany German Cancer Consortium and German Cancer Research Center, Partner Site Berlin, Germany
Dieter Beule Core Unit Bioinformatics, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, Charitéplatz 1, Berlin, Germany
Ulrich Keilholz Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany German Cancer Consortium and German Cancer Research Center, Partner Site Berlin, Germany
Ulf Leser Knowledge Management in Bioinformatics, Humboldt-Universität zu Berlin, Berlin, Germany
Damian T. Rieke Charité Comprehensive Cancer Center, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany German Cancer Consortium and German Cancer Research Center, Partner Site Berlin, Germany

Collapse

Waters MR, Aneja S, Hong JC. Unlocking the Power of ChatGPT, Artificial Intelligence, and Large Language Models: Practical Suggestions for Radiation Oncologists. Pract Radiat Oncol 2023;13:e484-e490. [PMID: 37598727 DOI: 10.1016/j.prro.2023.06.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 06/28/2023] [Accepted: 06/29/2023] [Indexed: 08/22/2023]

Yu P, Xu H, Hu X, Deng C. Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. Healthcare (Basel) 2023;11:2776. [PMID: 37893850 PMCID: PMC10606429 DOI: 10.3390/healthcare11202776] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/13/2023] [Accepted: 10/17/2023] [Indexed: 10/29/2023] Open

Draschl A, Hauer G, Fischerauer SF, Kogler A, Leitner L, Andreou D, Leithner A, Sadoghi P. Are ChatGPT's Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful? J Clin Med 2023;12:6655. [PMID: 37892793 PMCID: PMC10607052 DOI: 10.3390/jcm12206655] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/12/2023] [Accepted: 10/17/2023] [Indexed: 10/29/2023] Open

Wosny M, Strasser LM, Hastings J. Experience of Health Care Professionals Using Digital Tools in the Hospital: Qualitative Systematic Review. JMIR Hum Factors 2023;10:e50357. [PMID: 37847535 PMCID: PMC10618886 DOI: 10.2196/50357] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/25/2023] [Accepted: 08/25/2023] [Indexed: 10/18/2023] Open

Abstract

BACKGROUND

The digitalization of health care has many potential benefits, but it may also negatively impact health care professionals' well-being. Burnout can, in part, result from inefficient work processes related to the suboptimal implementation and use of health information technologies. Although strategies to reduce stress and mitigate clinician burnout typically involve individual-based interventions, emerging evidence suggests that improving the experience of using health information technologies can have a notable impact.

OBJECTIVE

The aim of this systematic review was to collect evidence of the benefits and challenges associated with the use of digital tools in hospital settings with a particular focus on the experiences of health care professionals using these tools.

METHODS

We conducted a systematic literature review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to explore the experience of health care professionals with digital tools in hospital settings. Using a rigorous selection process to ensure the methodological quality and validity of the study results, we included qualitative studies with distinct data that described the experiences of physicians and nurses. A panel of 3 independent researchers performed iterative data analysis and identified thematic constructs.

RESULTS

Of the 1175 unique primary studies, we identified 17 (1.45%) publications that focused on health care professionals' experiences with various digital tools in their day-to-day practice. Of the 17 studies, 10 (59%) focused on clinical decision support tools, followed by 6 (35%) studies focusing on electronic health records and 1 (6%) on a remote patient-monitoring tool. We propose a theoretical framework for understanding the complex interplay between the use of digital tools, experience, and outcomes. We identified 6 constructs that encompass the positive and negative experiences of health care professionals when using digital tools, along with moderators and outcomes. Positive experiences included feeling confident, responsible, and satisfied, whereas negative experiences included frustration, feeling overwhelmed, and feeling frightened. Positive moderators that may reinforce the use of digital tools included sufficient training and adequate workflow integration, whereas negative moderators comprised unfavorable social structures and the lack of training. Positive outcomes included improved patient care and increased workflow efficiency, whereas negative outcomes included increased workload, increased safety risks, and issues with information quality.

CONCLUSIONS

Although positive and negative outcomes and moderators that may affect the use of digital tools were commonly reported, the experiences of health care professionals, such as their thoughts and emotions, were less frequently discussed. On the basis of this finding, this study highlights the need for further research specifically targeting experiences as an important mediator of clinician well-being. It also emphasizes the importance of considering differences in the nature of specific tools as well as the profession and role of individual users.

TRIAL REGISTRATION

PROSPERO CRD42023393883; https://tinyurl.com/2htpzzxj.

Collapse

Goodman RS, Patrinely JR, Stone CA, Zimmerman E, Donald RR, Chang SS, Berkowitz ST, Finn AP, Jahangir E, Scoville EA, Reese TS, Friedman DL, Bastarache JA, van der Heijden YF, Wright JJ, Ye F, Carter N, Alexander MR, Choe JH, Chastain CA, Zic JA, Horst SN, Turker I, Agarwal R, Osmundson E, Idrees K, Kiernan CM, Padmanabhan C, Bailey CE, Schlegel CE, Chambless LB, Gibson MK, Osterman TJ, Wheless LE, Johnson DB. Accuracy and Reliability of Chatbot Responses to Physician Questions. JAMA Netw Open 2023;6:e2336483. [PMID: 37782499 PMCID: PMC10546234 DOI: 10.1001/jamanetworkopen.2023.36483] [Citation(s) in RCA: 58] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 08/22/2023] [Indexed: 10/03/2023] Open

Abstract

Importance

Natural language processing tools, such as ChatGPT (generative pretrained transformer, hereafter referred to as chatbot), have the potential to radically enhance the accessibility of medical information for health professionals and patients. Assessing the safety and efficacy of these tools in answering physician-generated questions is critical to determining their suitability in clinical settings, facilitating complex decision-making, and optimizing health care efficiency.

Objective

To assess the accuracy and comprehensiveness of chatbot-generated responses to physician-developed medical queries, highlighting the reliability and limitations of artificial intelligence-generated medical information.

Design, Setting, and Participants

Thirty-three physicians across 17 specialties generated 284 medical questions that they subjectively classified as easy, medium, or hard with either binary (yes or no) or descriptive answers. The physicians then graded the chatbot-generated answers to these questions for accuracy (6-point Likert scale with 1 being completely incorrect and 6 being completely correct) and completeness (3-point Likert scale, with 1 being incomplete and 3 being complete plus additional context). Scores were summarized with descriptive statistics and compared using the Mann-Whitney U test or the Kruskal-Wallis test. The study (including data analysis) was conducted from January to May 2023.

Main Outcomes and Measures

Accuracy, completeness, and consistency over time and between 2 different versions (GPT-3.5 and GPT-4) of chatbot-generated medical responses.

Results

Across all questions (n = 284) generated by 33 physicians (31 faculty members and 2 recent graduates from residency or fellowship programs) across 17 specialties, the median accuracy score was 5.5 (IQR, 4.0-6.0) (between almost completely and complete correct) with a mean (SD) score of 4.8 (1.6) (between mostly and almost completely correct). The median completeness score was 3.0 (IQR, 2.0-3.0) (complete and comprehensive) with a mean (SD) score of 2.5 (0.7). For questions rated easy, medium, and hard, the median accuracy scores were 6.0 (IQR, 5.0-6.0), 5.5 (IQR, 5.0-6.0), and 5.0 (IQR, 4.0-6.0), respectively (mean [SD] scores were 5.0 [1.5], 4.7 [1.7], and 4.6 [1.6], respectively; P = .05). Accuracy scores for binary and descriptive questions were similar (median score, 6.0 [IQR, 4.0-6.0] vs 5.0 [IQR, 3.4-6.0]; mean [SD] score, 4.9 [1.6] vs 4.7 [1.6]; P = .07). Of 36 questions with scores of 1.0 to 2.0, 34 were requeried or regraded 8 to 17 days later with substantial improvement (median score 2.0 [IQR, 1.0-3.0] vs 4.0 [IQR, 2.0-5.3]; P < .01). A subset of questions, regardless of initial scores (version 3.5), were regenerated and rescored using version 4 with improvement (mean accuracy [SD] score, 5.2 [1.5] vs 5.7 [0.8]; median score, 6.0 [IQR, 5.0-6.0] for original and 6.0 [IQR, 6.0-6.0] for rescored; P = .002).

Conclusions and Relevance

In this cross-sectional study, chatbot generated largely accurate information to diverse medical queries as judged by academic physician specialists with improvement over time, although it had important limitations. Further research and model development are needed to correct inaccuracies and for validation.

Collapse

Affiliation(s)

Rachel S. Goodman Vanderbilt University School of Medicine, Nashville, Tennessee
J. Randall Patrinely Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
Cosby A. Stone Department of Allergy, Pulmonology, and Critical Care, Vanderbilt University Medical Center, Nashville, Tennessee
Eli Zimmerman Department of Neurology, Vanderbilt University Medical Center, Nashville, Tennessee
Rebecca R. Donald Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee
Sam S. Chang Department of Urology, Vanderbilt University Medical Center, Nashville, Tennessee
Sean T. Berkowitz Vanderbilt Eye Institute, Department of Ophthalmology, Vanderbilt University Medical, Nashville, Tennessee
Avni P. Finn Vanderbilt Eye Institute, Department of Ophthalmology, Vanderbilt University Medical, Nashville, Tennessee
Eiman Jahangir Department of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
Elizabeth A. Scoville Department of Gastroenterology, Hepatology, and Nutrition, Vanderbilt University Medical Center, Nashville, Tennessee
Tyler S. Reese Department of Rheumatology and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee
Debra L. Friedman Department of Pediatric Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
Julie A. Bastarache Department of Allergy, Pulmonology, and Critical Care, Vanderbilt University Medical Center, Nashville, Tennessee
Yuri F. van der Heijden Department of Infectious Disease, Vanderbilt University Medical Center, Nashville, Tennessee
Jordan J. Wright Department of Diabetes, Endocrinology, and Metabolism, Vanderbilt University Medical Center, Nashville, Tennessee
Fei Ye Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
Nicholas Carter Division of Trauma and Surgical Critical Care, University of Miami Miller School of Medicine, Miami, Florida
Matthew R. Alexander Department of Cardiovascular Medicine and Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee
Jennifer H. Choe Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
Cody A. Chastain Department of Infectious Disease, Vanderbilt University Medical Center, Nashville, Tennessee
John A. Zic Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
Sara N. Horst Department of Gastroenterology, Hepatology, and Nutrition, Vanderbilt University Medical Center, Nashville, Tennessee
Isik Turker Department of Cardiology, Washington University School of Medicine in St Louis, St Louis, Missouri
Rajiv Agarwal Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
Evan Osmundson Department of Radiation Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
Kamran Idrees Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Colleen M. Kiernan Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Chandrasekhar Padmanabhan Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Christina E. Bailey Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Cameron E. Schlegel Department of Surgical Oncology & Endocrine Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Lola B. Chambless Department of Neurological Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
Michael K. Gibson Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee
Travis J. Osterman Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
Lee E. Wheless Department of Dermatology, Vanderbilt University Medical Center, Nashville, Tennessee
Douglas B. Johnson Department of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee

Collapse

Ong J, Hariprasad SM, Chhablani J. ChatGPT and GPT-4 in Ophthalmology: Applications of Large Language Model Artificial Intelligence in Retina. Ophthalmic Surg Lasers Imaging Retina 2023;54:557-562. [PMID: 37847163 DOI: 10.3928/23258160-20230926-01] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2023]

Eguia H, Sanz García JF. [Artificial intelligence, ChatGPT and primary care]. Semergen 2023;49:102069. [PMID: 37647848 DOI: 10.1016/j.semerg.2023.102069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 07/13/2023] [Indexed: 09/01/2023]

Liu J, Zheng J, Cai X, Wu D, Yin C. A descriptive study based on the comparison of ChatGPT and evidence-based neurosurgeons. iScience 2023;26:107590. [PMID: 37705958 PMCID: PMC10495632 DOI: 10.1016/j.isci.2023.107590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/21/2023] [Accepted: 08/04/2023] [Indexed: 09/15/2023] Open

Unleashing the Potential of ChatGPT. ARTIFICIAL INTELLIGENCE APPLICATIONS USING CHATGPT IN EDUCATION 2023:84-92. [DOI: 10.4018/978-1-6684-9300-7.ch008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]

Khlaif ZN, Mousa A, Hattab MK, Itmazi J, Hassan AA, Sanmugam M, Ayyoub A. The Potential and Concerns of Using AI in Scientific Research: ChatGPT Performance Evaluation. JMIR MEDICAL EDUCATION 2023;9:e47049. [PMID: 37707884 PMCID: PMC10636627 DOI: 10.2196/47049] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 06/04/2023] [Accepted: 07/21/2023] [Indexed: 09/15/2023]

Abstract

BACKGROUND

Artificial intelligence (AI) has many applications in various aspects of our daily life, including health, criminal, education, civil, business, and liability law. One aspect of AI that has gained significant attention is natural language processing (NLP), which refers to the ability of computers to understand and generate human language.

OBJECTIVE

This study aims to examine the potential for, and concerns of, using AI in scientific research. For this purpose, high-impact research articles were generated by analyzing the quality of reports generated by ChatGPT and assessing the application's impact on the research framework, data analysis, and the literature review. The study also explored concerns around ownership and the integrity of research when using AI-generated text.

METHODS

A total of 4 articles were generated using ChatGPT, and thereafter evaluated by 23 reviewers. The researchers developed an evaluation form to assess the quality of the articles generated. Additionally, 50 abstracts were generated using ChatGPT and their quality was evaluated. The data were subjected to ANOVA and thematic analysis to analyze the qualitative data provided by the reviewers.

RESULTS

When using detailed prompts and providing the context of the study, ChatGPT would generate high-quality research that could be published in high-impact journals. However, ChatGPT had a minor impact on developing the research framework and data analysis. The primary area needing improvement was the development of the literature review. Moreover, reviewers expressed concerns around ownership and the integrity of the research when using AI-generated text. Nonetheless, ChatGPT has a strong potential to increase human productivity in research and can be used in academic writing.

CONCLUSIONS

AI-generated text has the potential to improve the quality of high-impact research articles. The findings of this study suggest that decision makers and researchers should focus more on the methodology part of the research, which includes research design, developing research tools, and analyzing data in depth, to draw strong theoretical and practical implications, thereby establishing a revolution in scientific research in the era of AI. The practical implications of this study can be used in different fields such as medical education to deliver materials to develop the basic competencies for both medicine students and faculty members.

Collapse

Huang Y, Gomaa A, Semrau S, Haderlein M, Lettmaier S, Weissmann T, Grigo J, Tkhayat HB, Frey B, Gaipl U, Distel L, Maier A, Fietkau R, Bert C, Putz F. Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology. Front Oncol 2023;13:1265024. [PMID: 37790756 PMCID: PMC10543650 DOI: 10.3389/fonc.2023.1265024] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 08/23/2023] [Indexed: 10/05/2023] Open

Abstract

Purpose

The potential of large language models in medicine for education and decision-making purposes has been demonstrated as they have achieved decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. This work aims to evaluate the performance of ChatGPT-4 in the specialized field of radiation oncology.

Methods

The 38th American College of Radiology (ACR) radiation oncology in-training (TXIT) exam and the 2022 Red Journal Gray Zone cases are used to benchmark the performance of ChatGPT-4. The TXIT exam contains 300 questions covering various topics of radiation oncology. The 2022 Gray Zone collection contains 15 complex clinical cases.

Results

For the TXIT exam, ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 62.05% and 78.77%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates better knowledge of statistics, CNS & eye, pediatrics, biology, and physics than knowledge of bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs better in diagnosis, prognosis, and toxicity than brachytherapy and dosimetry. It lacks proficiency in in-depth details of clinical trials. For the Gray Zone cases, ChatGPT-4 is able to suggest a personalized treatment approach to each case with high correctness and comprehensiveness. Importantly, it provides novel treatment aspects for many cases, which are not suggested by any human experts.

Conclusion

Both evaluations demonstrate the potential of ChatGPT-4 in medical education for the general public and cancer patients, as well as the potential to aid clinical decision-making, while acknowledging its limitations in certain domains. Owing to the risk of hallucinations, it is essential to verify the content generated by models such as ChatGPT for accuracy.

Collapse

Affiliation(s)

Yixing Huang Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Ahmed Gomaa Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Sabine Semrau Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Marlen Haderlein Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Sebastian Lettmaier Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Thomas Weissmann Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Johanna Grigo Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Hassen Ben Tkhayat Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Benjamin Frey Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Udo Gaipl Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Luitpold Distel Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Andreas Maier Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Rainer Fietkau Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Christoph Bert Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany
Florian Putz Department of Radiation Oncology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany Comprehensive Cancer Center Erlangen-EMN (CCC ER-EMN), Erlangen, Germany

Collapse

Pugliese G, Maccari A, Felisati E, Felisati G, Giudici L, Rapolla C, Pisani A, Saibene AM. Are artificial intelligence large language models a reliable tool for difficult differential diagnosis? An a posteriori analysis of a peculiar case of necrotizing otitis externa. Clin Case Rep 2023;11:e7933. [PMID: 37736475 PMCID: PMC10509342 DOI: 10.1002/ccr3.7933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/30/2023] [Accepted: 09/12/2023] [Indexed: 09/23/2023] Open

Lu Y, Wu H, Qi S, Cheng K. Artificial Intelligence in Intensive Care Medicine: Toward a ChatGPT/GPT-4 Way? Ann Biomed Eng 2023;51:1898-1903. [PMID: 37179277 PMCID: PMC10182840 DOI: 10.1007/s10439-023-03234-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 05/08/2023] [Indexed: 05/15/2023]

Ge J, Fontil V, Ackerman S, Pletcher MJ, Lai JC. Clinical decision support and electronic interventions to improve care quality in chronic liver diseases and cirrhosis. Hepatology 2023:01515467-990000000-00546. [PMID: 37611253 PMCID: PMC10998693 DOI: 10.1097/hep.0000000000000583] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 07/17/2023] [Indexed: 08/25/2023]

Liu S, McCoy AB, Wright AP, Carew B, Genkins JZ, Huang SS, Peterson JF, Steitz B, Wright A. Leveraging Large Language Models for Generating Responses to Patient Messages. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.14.23292669. [PMID: 37503263 PMCID: PMC10370222 DOI: 10.1101/2023.07.14.23292669] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]

Wei WQ, Yan C, Grabowska M, Dickson A, Li B, Wen Z, Roden D, Stein C, Embí P, Peterson J, Feng Q, Malin B. Leveraging Generative AI to Prioritize Drug Repurposing Candidates: Validating Identified Candidates for Alzheimer's Disease in Real-World Clinical Datasets. RESEARCH SQUARE 2023:rs.3.rs-3125859. [PMID: 37503019 PMCID: PMC10371084 DOI: 10.21203/rs.3.rs-3125859/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]

Yan C, Grabowska ME, Dickson AL, Li B, Wen Z, Roden DM, Stein CM, Embí PJ, Peterson JF, Feng Q, Malin BA, Wei WQ. Leveraging Generative AI to Prioritize Drug Repurposing Candidates: Validating Identified Candidates for Alzheimer's Disease in Real-World Clinical Datasets. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.07.23292388. [PMID: 37461512 PMCID: PMC10350158 DOI: 10.1101/2023.07.07.23292388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]

Gupta NK, Doyle DM, D'Amico RS. Response to "Large language model artificial intelligence: the current state and future of ChatGPT in neuro-oncology publishing". J Neurooncol 2023;163:731-733. [PMID: 37440100 DOI: 10.1007/s11060-023-04396-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 07/08/2023] [Indexed: 07/14/2023]

Bakken S. AI in health: keeping the human in the loop. J Am Med Inform Assoc 2023;30:1225-1226. [PMID: 37337923 PMCID: PMC10280340 DOI: 10.1093/jamia/ocad091] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 05/10/2023] [Indexed: 06/21/2023] Open

Agrawal A. Laying an equitable data foundation for foundation models. THE LANCET REGIONAL HEALTH. SOUTHEAST ASIA 2023;13:100221. [PMID: 37383558 PMCID: PMC10305916 DOI: 10.1016/j.lansea.2023.100221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 05/09/2023] [Indexed: 06/30/2023]

Sorin V, Barash Y, Konen E, Klang E. Large language models for oncological applications. J Cancer Res Clin Oncol 2023:10.1007/s00432-023-04824-w. [PMID: 37160626 DOI: 10.1007/s00432-023-04824-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 04/28/2023] [Indexed: 05/11/2023]