51
|
Değerli Yİ, Özata Değerli MN. Using ChatGPT as a tool during occupational therapy intervention: A case report in mild cognitive impairment. Assist Technol 2024:1-10. [PMID: 39446069 DOI: 10.1080/10400435.2024.2416495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/07/2024] [Indexed: 10/25/2024] Open
Abstract
This case report examined the impact of computer programmed assistive technology developed, using ChatGPT as a tool when designing an occupational therapy intervention on a client's independence in activities of daily living. A 66-year-old female client with mild cognitive impairment consulted an occupational therapist due to difficulties with activities of daily living. The occupational therapist developed two activity assistance computer programs using ChatGPT as a resource. The client did not interact directly with ChatGPT; instead, the occupational therapist used the technology to design and implement the intervention. A computer programmed assistive technology-based occupational therapy intervention was completed for eight weeks. The occupational therapist trained the client to use these programs in the clinical setting and at home. As a result of the intervention, the client's performance and independence in daily activities improved. The results of this study emphasize that ChatGPT may help occupational therapists as a tool to design simple computer programmed assistive technology interventions without requiring additional professional input.
Collapse
Affiliation(s)
- Yusuf İslam Değerli
- Kızılcahamam Vocational School of Health Services, Ankara University, Ankara, Turkey
| | | |
Collapse
|
52
|
Shalong W, Yi Z, Bin Z, Ganglei L, Jinyu Z, Yanwen Z, Zequn Z, Lianwen Y, Feng R. Enhancing self-directed learning with custom GPT AI facilitation among medical students: A randomized controlled trial. MEDICAL TEACHER 2024:1-8. [PMID: 39425996 DOI: 10.1080/0142159x.2024.2413023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Accepted: 10/02/2024] [Indexed: 10/21/2024]
Abstract
OBJECTIVE This study aims to assess the impact of LearnGuide, a specialized ChatGPT tool designed to support self-directed learning among medical students. MATERIALS AND METHODS In this 14-week randomized controlled trial (ClinicalTrials.gov NCT06276049), 103 medical students were assigned to either an intervention group, which received 12 weeks of problem-based training with LearnGuide support, or a control group, which received identical training without AI assistance. Primary and secondary outcomes, including Self-Directed Learning Scale scores at 6 and 12 weeks, Cornell Critical Thinking Test Level Z scores, and Global Flow Scores, were evaluated with a 14-week follow-up. Mann-Whitney U tests were used for statistical comparisons between the groups. RESULTS At 6 weeks, the intervention group showed a marginally higher median Self-Directed Learning Scale score, which further improved by 12 weeks (4.15 [95% CI, 0.82 to 7.48]; p = 0.01) and was sustained at the 14-week follow-up. Additionally, this group demonstrated notable improvements in the Cornell Critical Thinking Test Score at 12 weeks (7.11 [95% CI, 4.50 to 9.72]; p < 0.001), which persisted into the 14-week follow-up. The group also experienced enhancements in the Global Flow Score from 6 weeks, maintaining superiority over the control group through 12 weeks. CONCLUSIONS LearnGuide significantly enhanced self-directed learning, critical thinking, and flow experiences in medical students, highlighting the crucial role of AI tools in advancing medical education.
Collapse
Affiliation(s)
- Wang Shalong
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Zuo Yi
- School of Information Technology and Management, Hunan University of Finance and Economics, Changsha, China
| | - Zou Bin
- Department of General Surgery, The Affiliated Changsha Central Hospital Hengyang Medical School, University of South China, Changsha, China
| | - Liu Ganglei
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Zhou Jinyu
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Zheng Yanwen
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Zhang Zequn
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Yuan Lianwen
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Ren Feng
- Department of General Surgery, The Second Xiangya Hospital of Central South University, Changsha, China
| |
Collapse
|
53
|
Goktas P, Grzybowski A. Assessing the Impact of ChatGPT in Dermatology: A Comprehensive Rapid Review. J Clin Med 2024; 13:5909. [PMID: 39407969 PMCID: PMC11477344 DOI: 10.3390/jcm13195909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 09/23/2024] [Accepted: 10/01/2024] [Indexed: 10/20/2024] Open
Abstract
Background/Objectives: The use of artificial intelligence (AI) in dermatology is expanding rapidly, with ChatGPT, a large language model (LLM) from OpenAI, showing promise in patient education, clinical decision-making, and teledermatology. Despite its potential, the ethical, clinical, and practical implications of its application remain insufficiently explored. This study aims to evaluate the effectiveness, challenges, and future prospects of ChatGPT in dermatology, focusing on clinical applications, patient interactions, and medical writing. ChatGPT was selected due to its broad adoption, extensive validation, and strong performance in dermatology-related tasks. Methods: A thorough literature review was conducted, focusing on publications related to ChatGPT and dermatology. The search included articles in English from November 2022 to August 2024, as this period captures the most recent developments following the launch of ChatGPT in November 2022, ensuring that the review includes the latest advancements and discussions on its role in dermatology. Studies were chosen based on their relevance to clinical applications, patient interactions, and ethical issues. Descriptive metrics, such as average accuracy scores and reliability percentages, were used to summarize study characteristics, and key findings were analyzed. Results: ChatGPT has shown significant potential in passing dermatology specialty exams and providing reliable responses to patient queries, especially for common dermatological conditions. However, it faces limitations in diagnosing complex cases like cutaneous neoplasms, and concerns about the accuracy and completeness of its information persist. Ethical issues, including data privacy, algorithmic bias, and the need for transparent guidelines, were identified as critical challenges. Conclusions: While ChatGPT has the potential to significantly enhance dermatological practice, particularly in patient education and teledermatology, its integration must be cautious, addressing ethical concerns and complementing, rather than replacing, dermatologist expertise. Future research should refine ChatGPT's diagnostic capabilities, mitigate biases, and develop comprehensive clinical guidelines.
Collapse
Affiliation(s)
- Polat Goktas
- UCD School of Computer Science, University College Dublin, D04 V1W8 Dublin, Ireland;
| | - Andrzej Grzybowski
- Department of Ophthalmology, University of Warmia and Mazury, 10-719 Olsztyn, Poland
- Institute for Research in Ophthalmology, Foundation for Ophthalmology Development, 61-553 Poznan, Poland
| |
Collapse
|
54
|
Huo B, Marfo N, Sylla P, Calabrese E, Kumar S, Slater BJ, Walsh DS, Vosburg W. Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD. Surg Endosc 2024; 38:5668-5677. [PMID: 39134725 DOI: 10.1007/s00464-024-11155-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 08/04/2024] [Indexed: 10/08/2024]
Abstract
BACKGROUND Large Language Models (LLMs) provide clinical guidance with inconsistent accuracy due to limitations with their training dataset. LLMs are "teachable" through customization. We compared the ability of the generic ChatGPT-4 model and a customized version of ChatGPT-4 to provide recommendations for the surgical management of gastroesophageal reflux disease (GERD) to both surgeons and patients. METHODS Sixty patient cases were developed using eligibility criteria from the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) & United European Gastroenterology (UEG)-European Association of Endoscopic. Surgery (EAES) guidelines for the surgical management of GERD. Standardized prompts were engineered for physicians as the end-user, with separate layperson prompts for patients. A customized GPT was developed to generate recommendations based on guidelines, called the GERD Tool for Surgery (GTS). Both the GTS and generic ChatGPT-4 were queried July 21st, 2024. Model performance was evaluated by comparing responses to SAGES & UEG-EAES guideline recommendations. Outcome data was presented using descriptive statistics including counts and percentages. RESULTS The GTS provided accurate recommendations for the surgical management of GERD for 60/60 (100.0%) surgeon inquiries and 40/40 (100.0%) patient inquiries based on guideline recommendations. The Generic ChatGPT-4 model generated accurate guidance for 40/60 (66.7%) surgeon inquiries and 19/40 (47.5%) patient inquiries. The GTS produced recommendations based on the 2021 SAGES & UEG-EAES guidelines on the surgical management of GERD, while the generic ChatGPT-4 model generated guidance without citing evidence to support its recommendations. CONCLUSION ChatGPT-4 can be customized to overcome limitations with its training dataset to provide recommendations for the surgical management of GERD with reliable accuracy and consistency. The training of LLM models can be used to help integrate this efficient technology into the creation of robust and accurate information for both surgeons and patients. Prospective data is needed to assess its effectiveness in a pragmatic clinical environment.
Collapse
Affiliation(s)
- Bright Huo
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada
| | - Nana Marfo
- Ross University School of Medicine, Miramar, FL, USA
| | - Patricia Sylla
- Division of Colon and Rectal Surgery, Department of Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Sunjay Kumar
- Department of General Surgery, Thomas Jefferson University Hospital, Philadelphia, PA, USA
| | | | - Danielle S Walsh
- Department of Surgery, University of Kentucky, Lexington, KY, USA
| | - Wesley Vosburg
- Department of Surgery, Mount Auburn Hospital, Harvard Medical School, Cambridge, MA, USA.
| |
Collapse
|
55
|
Shahrul AI, Syed Mohamed AMF. A Comparative Evaluation of Statistical Product and Service Solutions (SPSS) and ChatGPT-4 in Statistical Analyses. Cureus 2024; 16:e72581. [PMID: 39610603 PMCID: PMC11602406 DOI: 10.7759/cureus.72581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2024] [Indexed: 11/30/2024] Open
Abstract
BACKGROUND The objective of this study was to assess the accuracy of Chat Generative Pre-trained Transformer 4.0 (ChatGPT-4; OpenAI, San Francisco, CA ) compared to Statistical Product and Service Solutions (SPSS; IBM SPSS Statistics for Windows, Armonk, NY) in performing statistical analyses commonly used in medical and dental research. METHODS The datasets were analysed using SPSS (version 26) and ChatGPT-4. Statistical tests included the independent t-test, paired t-test, ANOVA, chi-square test, Wilcoxon signed-rank test, Mann-Whitney U test, Pearson and Spearman correlation, regression analysis, kappa statistic, intraclass correlation coefficient (ICC), Bland-Altman analysis, and sensitivity and specificity analysis. Descriptive statistics were used to report results, and differences between the two tools were noted. RESULTS SPSS and ChatGPT-4 produced identical results for the independent sample t-test, paired t-test, and simple linear regression. In one-way ANOVA, both tools provided consistent F-values, but post-hoc analysis revealed discrepancies in mean differences and confidence intervals. Pearson chi-square and Wilcoxon signed-rank tests showed variations in p-values and Z-values. Mann-Whitney U test had differences in interquartile range (IQR), U, and Z-values. Pearson and Spearman's correlations were consistent, with IQR differences in Spearman. Sensitivity, specificity, and area under the curve (AUC) analyses were consistent, though differences in standard errors and confidence intervals were observed. CONCLUSION ChatGPT-4 produced accurate results for several statistical tests, matching SPSS in simpler analyses. However, discrepancies in post-hoc analyses, confidence intervals, and more complex tests indicate that careful validation is required when using ChatGPT-4 for detailed statistical work. Researchers should exercise caution and cross-validate results with established tools such as SPSS.
Collapse
Affiliation(s)
- Al Imran Shahrul
- Department of Family Oral Health, Faculty of Dentistry, Universiti Kebangsaan Malaysia, Kuala Lumpur, MYS
| | | |
Collapse
|
56
|
Barua M. Assessing the Performance of ChatGPT in Answering Patients' Questions Regarding Congenital Bicuspid Aortic Valve. Cureus 2024; 16:e72293. [PMID: 39583462 PMCID: PMC11585396 DOI: 10.7759/cureus.72293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/23/2024] [Indexed: 11/26/2024] Open
Abstract
AIM Artificial intelligence (AI) models, such as ChatGPT, are widely being used in academia as well as by the common public. In the field of medicine, the information obtained by the professionals as well as by the patients from the AI tools has significant advantages while at the same time posing valid concerns regarding the validity and adequacy of information regarding healthcare delivery and utilization. Therefore, it is important to vet these AI tools through the prism of practicing physicians. METHODS To demonstrate the immense utility as well as potential concerns of using ChatGPT to gather medical information, a set of questions were posed to the chatbot regarding a hypothetical patient with a congenital bicuspid aortic valve (BAV), and the answers were recorded and reviewed based on three criteria: (i) readability/technicality; (ii) adequacy/completeness; and (iii) accuracy/authenticity. RESULTS While the ChatGPT provided detailed information about clinical pictures, treatment, and outcomes regarding BAV, the information was generic and brief, and the utility was limited due to a lack of specific information based on an individual patient's clinical status. The authenticity of the information could not be verified due to a lack of citations. Further, human aspects that would normally emerge in nuanced doctor-patient communication were missing in the ChatGPT output. CONCLUSION Although the performance of AI in medical care is expected to grow, imperfections and ethical concerns may remain a huge challenge in utilizing information from the chatbots alone without adequate communications with health providers, despite having numerous advantages of this technology to society in many walks of human life.
Collapse
Affiliation(s)
- Mousumi Barua
- Internal Medicine, School of Public Health and Health Professions, University at Buffalo, Buffalo, USA
| |
Collapse
|
57
|
Pap IA, Oniga S. eHealth Assistant AI Chatbot Using a Large Language Model to Provide Personalized Answers through Secure Decentralized Communication. SENSORS (BASEL, SWITZERLAND) 2024; 24:6140. [PMID: 39338885 PMCID: PMC11436070 DOI: 10.3390/s24186140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Revised: 08/26/2024] [Accepted: 09/20/2024] [Indexed: 09/30/2024]
Abstract
In this paper, we present the implementation of an artificial intelligence health assistant designed to complement a previously built eHealth data acquisition system for helping both patients and medical staff. The assistant allows users to query medical information in a smarter, more natural way, respecting patient privacy and using secure communications through a chat style interface based on the Matrix decentralized open protocol. Assistant responses are constructed locally by an interchangeable large language model (LLM) that can form rich and complete answers like most human medical staff would. Restricted access to patient information and other related resources is provided to the LLM through various methods for it to be able to respond correctly based on specific patient data. The Matrix protocol allows deployments to be run in an open federation; hence, the system can be easily scaled.
Collapse
Affiliation(s)
- Iuliu Alexandru Pap
- Department of Electric, Electronic and Computer Engineering, Technical University of Cluj-Napoca, North University Center of Baia Mare, 430083 Baia Mare, Romania
| | - Stefan Oniga
- Department of Electric, Electronic and Computer Engineering, Technical University of Cluj-Napoca, North University Center of Baia Mare, 430083 Baia Mare, Romania
- Department of IT Systems and Networks, Faculty of Informatics, University of Debrecen, 4032 Debrecen, Hungary
| |
Collapse
|
58
|
Si Y, Yang Y, Wang X, Zu J, Chen X, Fan X, An R, Gong S. Quality and Accountability of ChatGPT in Health Care in Low- and Middle-Income Countries: Simulated Patient Study. J Med Internet Res 2024; 26:e56121. [PMID: 39250188 PMCID: PMC11420570 DOI: 10.2196/56121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Revised: 04/21/2024] [Accepted: 07/30/2024] [Indexed: 09/10/2024] Open
Abstract
Using simulated patients to mimic 9 established noncommunicable and infectious diseases, we assessed ChatGPT's performance in treatment recommendations for common diseases in low- and middle-income countries. ChatGPT had a high level of accuracy in both correct diagnoses (20/27, 74%) and medication prescriptions (22/27, 82%) but a concerning level of unnecessary or harmful medications (23/27, 85%) even with correct diagnoses. ChatGPT performed better in managing noncommunicable diseases than infectious ones. These results highlight the need for cautious AI integration in health care systems to ensure quality and safety.
Collapse
Affiliation(s)
- Yafei Si
- UNSW Business School and CEPAR, The University of New South Wales, Kensington, Australia
| | - Yuyi Yang
- Division of Computational and Data Sciences, Washington University in St Louis, St. Louis, MO, United States
| | - Xi Wang
- Brown School, Washington University in St Louis, St Louis, MT, United States
| | - Jiaqi Zu
- Global Health Research Center, Duke Kunshan University, Kunshan, China
| | - Xi Chen
- Department of Health Policy and Management, Yale University, New Haven, CT, United States
- Department of Economics, Yale University, New Haven, CT, United States
| | - Xiaojing Fan
- School of Public Policy and Administration, Xi'an Jiaotong University, Xi'an, China
| | - Ruopeng An
- Brown School, Washington University in St Louis, St Louis, MT, United States
- Silver School of Social Work, New York University, New York, NY, United States
| | - Sen Gong
- Centre for International Studies on Development and Governance, Zhejiang University, Hangzhou, China
| |
Collapse
|
59
|
Halaseh FF, Yang JS, Danza CN, Halaseh R, Spiegelman L. ChatGPT's Role in Improving Education Among Patients Seeking Emergency Medical Treatment. West J Emerg Med 2024; 25:845-855. [PMID: 39319818 PMCID: PMC11418867 DOI: 10.5811/westjem.18650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 06/21/2024] [Accepted: 06/21/2024] [Indexed: 09/26/2024] Open
Abstract
Providing appropriate patient education during a medical encounter remains an important area for improvement across healthcare settings. Personalized resources can offer an impactful way to improve patient understanding and satisfaction during or after a healthcare visit. ChatGPT is a novel chatbot-computer program designed to simulate conversation with humans- that has the potential to assist with care-related questions, clarify discharge instructions, help triage medical problem urgency, and could potentially be used to improve patient-clinician communication. However, due to its training methodology, ChatGPT has inherent limitations, including technical restrictions, risk of misinformation, lack of input standardization, and privacy concerns. Medicolegal liability also remains an open question for physicians interacting with this technology. Nonetheless, careful utilization of ChatGPT in clinical medicine has the potential to supplement patient education in important ways.
Collapse
Affiliation(s)
- Faris F. Halaseh
- University of California, Irvine, School of Medicine, Irvine, California
| | - Justin S. Yang
- University of California, Irvine, School of Medicine, Irvine, California
| | - Clifford N. Danza
- University of California, Irvine, School of Medicine, Irvine, California
| | - Rami Halaseh
- Kaiser Permanente San Francisco, Department of Internal Medicine, San Francisco, California
| | - Lindsey Spiegelman
- University of California, Irvine, Department of Emergency Medicine, Irvine, California
| |
Collapse
|
60
|
Hsueh JY, Nethala D, Singh S, Hyman JA, Gelikman DG, Linehan WM, Ball MW. Exploring the Feasibility of GPT-4 as a Data Extraction Tool for Renal Surgery Operative Notes. UROLOGY PRACTICE 2024; 11:782-789. [PMID: 38913566 PMCID: PMC11335444 DOI: 10.1097/upj.0000000000000599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 04/12/2024] [Indexed: 06/26/2024]
Abstract
INTRODUCTION GPT-4 is a large language model with potential for multiple applications in urology. Our study sought to evaluate GPT-4's performance in data extraction from renal surgery operative notes. METHODS GPT-4 was queried to extract information on laterality, surgery, approach, estimated blood loss, and ischemia time from deidentified operative notes. Match rates were determined by the number of "matched" data points between GPT-4 and human-curated extraction. Accuracy rates were calculated after manually reviewing "not matched" data points. Cohen's kappa and the intraclass coefficient were used to evaluate interrater agreement/reliability. RESULTS Our cohort consisted of 1498 renal surgeries from 2003 to 2023. Match rates were high for laterality (94.4%), surgery (92.5%), and approach (89.4%), but lower for estimated blood loss (77.1%) and ischemia time (25.6%). GPT-4 was more accurate for estimated blood loss (90.3% vs 85.5% human curated) and similarly accurate for laterality (95.2% vs 95.3% human curated). Human-curated accuracy rates were higher for surgery (99.3% vs 93% GPT-4), approach (97.9% vs 90.8% GPT-4), and ischemia time (95.6% vs 30.7% GPT-4). Cohen's kappa was 0.96 for laterality, 0.83 for approach, and 0.71 for surgery. The intraclass coefficient was 0.62 for estimated blood loss and 0.09 for ischemia time. CONCLUSIONS Match and accuracy rates were higher for categorical variables. GPT-4 data extraction was particularly error prone for variables with heterogenous documentation styles. The role of a standard operative template to aid data extraction will be explored in the future. GPT-4 can be utilized as a helpful and efficient data extraction tool with manual feedback.
Collapse
Affiliation(s)
- Jessica Y. Hsueh
- Urologic Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - Daniel Nethala
- Urologic Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - Shiva Singh
- Radiology and Imaging Services, Clinical Center, National Institutes of Health, Bethesda, MD
| | - Jason A. Hyman
- Urologic Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - David G. Gelikman
- Molecular Imaging Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - W. Marston Linehan
- Urologic Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - Mark W. Ball
- Urologic Oncology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD
| |
Collapse
|
61
|
Song Z, Xu Y, He Y, Wang Y. A commentary on 'Application and challenges of ChatGPT in interventional surgery'. Int J Surg 2024; 110:5961-5962. [PMID: 38814338 PMCID: PMC11392129 DOI: 10.1097/js9.0000000000001757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 05/19/2024] [Indexed: 05/31/2024]
Affiliation(s)
- Zhiwei Song
- Department of Neurology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University
| | - Yiya Xu
- Department of Neurology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University
| | - Yingchao He
- Department of Neurology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University
| | - Yinzhou Wang
- Department of Neurology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University
- Fujian Key Laboratory of Medical Analysis, Fujian Academy of Medical Sciences, Fuzhou, Fujian, People's Republic of China
| |
Collapse
|
62
|
Zhang H, Wang X, Su S. Re: ChatGPT encounters multiple opportunities and challenges in neurosurgery. Int J Surg 2024; 110:6030-6031. [PMID: 38874486 PMCID: PMC11392181 DOI: 10.1097/js9.0000000000001825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Accepted: 06/03/2024] [Indexed: 06/15/2024]
Affiliation(s)
- Hongyu Zhang
- Department of Neurosurgery, The Fourth Affiliated Hospital of Harbin Medical University, Harbin
| | - Xuefeng Wang
- Department of Neurosurgery, The Fourth Affiliated Hospital of Harbin Medical University, Harbin
| | - Shu Su
- Department of Epidemiology and Biostatistics, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, People's Republic of China
| |
Collapse
|
63
|
Malik S, Kharel H, Dahiya DS, Ali H, Blaney H, Singh A, Dhar J, Perisetti A, Facciorusso A, Chandan S, Mohan BP. Assessing ChatGPT4 with and without retrieval-augmented generation in anticoagulation management for gastrointestinal procedures. Ann Gastroenterol 2024; 37:514-526. [PMID: 39238788 PMCID: PMC11372545 DOI: 10.20524/aog.2024.0907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 06/27/2024] [Indexed: 09/07/2024] Open
Abstract
Background In view of the growing complexity of managing anticoagulation for patients undergoing gastrointestinal (GI) procedures, this study evaluated ChatGPT-4's ability to provide accurate medical guidance, comparing it with its prior artificial intelligence (AI) models (ChatGPT-3.5) and the retrieval-augmented generation (RAG)-supported model (ChatGPT4-RAG). Methods Thirty-six anticoagulation-related questions, based on professional guidelines, were answered by ChatGPT-4. Nine gastroenterologists assessed these responses for accuracy and relevance. ChatGPT-4's performance was also compared to that of ChatGPT-3.5 and ChatGPT4-RAG. Additionally, a survey was conducted to understand gastroenterologists' perceptions of ChatGPT-4. Results ChatGPT-4's responses showed significantly better accuracy and coherence compared to ChatGPT-3.5, with 30.5% of responses fully accurate and 47.2% generally accurate. ChatGPT4-RAG demonstrated a higher ability to integrate current information, achieving 75% full accuracy. Notably, for diagnostic and therapeutic esophagogastroduodenoscopy, 51.8% of responses were fully accurate; for endoscopic retrograde cholangiopancreatography with and without stent placement, 42.8% were fully accurate; and for diagnostic and therapeutic colonoscopy, 50% were fully accurate. Conclusions ChatGPT4-RAG significantly advances anticoagulation management in endoscopic procedures, offering reliable and precise medical guidance. However, medicolegal considerations mean that a 75% full accuracy rate remains inadequate for independent clinical decision-making. AI may be more appropriately utilized to support and confirm clinicians' decisions, rather than replace them. Further evaluation is essential to maintain patient confidentiality and the integrity of the physician-patient relationship.
Collapse
Affiliation(s)
- Sheza Malik
- Internal Medicine, Rochester General Hospital, NY, USA (Sheza Malik, Himal Kharel)
| | - Himal Kharel
- Internal Medicine, Rochester General Hospital, NY, USA (Sheza Malik, Himal Kharel)
| | - Dushyant S Dahiya
- Gastroenterology, Hepatology, University of Kansas School of Medicine, Kansas, USA (Dushyant S. Dahiya)
| | - Hassam Ali
- Gastroenterology, Hepatology, East Carolina University, NC, USA (Hassam Ali)
| | - Hanna Blaney
- Gastroenterology, Hepatology, New York University Grossman School of Medicine, NC, USA (Hanna Blaney)
| | - Achintya Singh
- Gastroenterology, Hepatology, Metro Health, OH, USA (Achintya Singh)
| | - Jahnvi Dhar
- Gastroenterology, Hepatology, Postgraduate Institute of Medical Education and Research, Chandigarh, India (Jahnvi Dhar)
| | | | - Antonio Facciorusso
- Gastroenterology, Hepatology, University of Foggia, Italy (Antonio Facciorusso)
| | - Saurabh Chandan
- Gastroenterology, Hepatology, Creighton University Medical Center, USA (Saurabh Chandan)
| | - Babu P Mohan
- Gastroenterology, Hepatology, Orlando Gastroenterology, FL, USA (Babu P. Mohan)
| |
Collapse
|
64
|
Pool J, Indulska M, Sadiq S. Large language models and generative AI in telehealth: a responsible use lens. J Am Med Inform Assoc 2024; 31:2125-2136. [PMID: 38441296 PMCID: PMC11339524 DOI: 10.1093/jamia/ocae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/05/2024] [Accepted: 02/14/2024] [Indexed: 08/23/2024] Open
Abstract
OBJECTIVE This scoping review aims to assess the current research landscape of the application and use of large language models (LLMs) and generative Artificial Intelligence (AI), through tools such as ChatGPT in telehealth. Additionally, the review seeks to identify key areas for future research, with a particular focus on AI ethics considerations for responsible use and ensuring trustworthy AI. MATERIALS AND METHODS Following the scoping review methodological framework, a search strategy was conducted across 6 databases. To structure our review, we employed AI ethics guidelines and principles, constructing a concept matrix for investigating the responsible use of AI in telehealth. Using the concept matrix in our review enabled the identification of gaps in the literature and informed future research directions. RESULTS Twenty studies were included in the review. Among the included studies, 5 were empirical, and 15 were reviews and perspectives focusing on different telehealth applications and healthcare contexts. Benefit and reliability concepts were frequently discussed in these studies. Privacy, security, and accountability were peripheral themes, with transparency, explainability, human agency, and contestability lacking conceptual or empirical exploration. CONCLUSION The findings emphasized the potential of LLMs, especially ChatGPT, in telehealth. They provide insights into understanding the use of LLMs, enhancing telehealth services, and taking ethical considerations into account. By proposing three future research directions with a focus on responsible use, this review further contributes to the advancement of this emerging phenomenon of healthcare AI.
Collapse
Affiliation(s)
- Javad Pool
- ARC Industrial Transformation Training Centre for Information Resilience (CIRES), The University of Queensland, Brisbane 4072, Australia
- School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane 4072, Australia
| | - Marta Indulska
- ARC Industrial Transformation Training Centre for Information Resilience (CIRES), The University of Queensland, Brisbane 4072, Australia
- Business School, The University of Queensland, Brisbane 4072, Australia
| | - Shazia Sadiq
- ARC Industrial Transformation Training Centre for Information Resilience (CIRES), The University of Queensland, Brisbane 4072, Australia
- School of Electrical Engineering and Computer Science, The University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
65
|
Hindelang M, Sitaru S, Zink A. Transforming Health Care Through Chatbots for Medical History-Taking and Future Directions: Comprehensive Systematic Review. JMIR Med Inform 2024; 12:e56628. [PMID: 39207827 PMCID: PMC11393511 DOI: 10.2196/56628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 05/08/2024] [Accepted: 07/11/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND The integration of artificial intelligence and chatbot technology in health care has attracted significant attention due to its potential to improve patient care and streamline history-taking. As artificial intelligence-driven conversational agents, chatbots offer the opportunity to revolutionize history-taking, necessitating a comprehensive examination of their impact on medical practice. OBJECTIVE This systematic review aims to assess the role, effectiveness, usability, and patient acceptance of chatbots in medical history-taking. It also examines potential challenges and future opportunities for integration into clinical practice. METHODS A systematic search included PubMed, Embase, MEDLINE (via Ovid), CENTRAL, Scopus, and Open Science and covered studies through July 2024. The inclusion and exclusion criteria for the studies reviewed were based on the PICOS (participants, interventions, comparators, outcomes, and study design) framework. The population included individuals using health care chatbots for medical history-taking. Interventions focused on chatbots designed to facilitate medical history-taking. The outcomes of interest were the feasibility, acceptance, and usability of chatbot-based medical history-taking. Studies not reporting on these outcomes were excluded. All study designs except conference papers were eligible for inclusion. Only English-language studies were considered. There were no specific restrictions on study duration. Key search terms included "chatbot*," "conversational agent*," "virtual assistant," "artificial intelligence chatbot," "medical history," and "history-taking." The quality of observational studies was classified using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) criteria (eg, sample size, design, data collection, and follow-up). The RoB 2 (Risk of Bias) tool assessed areas and the levels of bias in randomized controlled trials (RCTs). RESULTS The review included 15 observational studies and 3 RCTs and synthesized evidence from different medical fields and populations. Chatbots systematically collect information through targeted queries and data retrieval, improving patient engagement and satisfaction. The results show that chatbots have great potential for history-taking and that the efficiency and accessibility of the health care system can be improved by 24/7 automated data collection. Bias assessments revealed that of the 15 observational studies, 5 (33%) studies were of high quality, 5 (33%) studies were of moderate quality, and 5 (33%) studies were of low quality. Of the RCTs, 2 had a low risk of bias, while 1 had a high risk. CONCLUSIONS This systematic review provides critical insights into the potential benefits and challenges of using chatbots for medical history-taking. The included studies showed that chatbots can increase patient engagement, streamline data collection, and improve health care decision-making. For effective integration into clinical practice, it is crucial to design user-friendly interfaces, ensure robust data security, and maintain empathetic patient-physician interactions. Future research should focus on refining chatbot algorithms, improving their emotional intelligence, and extending their application to different health care settings to realize their full potential in modern medicine. TRIAL REGISTRATION PROSPERO CRD42023410312; www.crd.york.ac.uk/prospero.
Collapse
Affiliation(s)
- Michael Hindelang
- Department of Dermatology and Allergy, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
- Pettenkofer School of Public Health, Munich, Germany
- Institute for Medical Information Processing, Biometry and Epidemiology (IBE), Faculty of Medicine, Ludwig-Maximilian University, LMU, Munich, Germany
| | - Sebastian Sitaru
- Department of Dermatology and Allergy, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
| | - Alexander Zink
- Department of Dermatology and Allergy, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
- Division of Dermatology and Venereology, Department of Medicine Solna, Karolinska Institute, Stockholm, Sweden
| |
Collapse
|
66
|
Hwai H, Ho YJ, Wang CH, Huang CH. Large language model application in emergency medicine and critical care. J Formos Med Assoc 2024:S0929-6646(24)00400-5. [PMID: 39198112 DOI: 10.1016/j.jfma.2024.08.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 08/13/2024] [Accepted: 08/23/2024] [Indexed: 09/01/2024] Open
Abstract
In the rapidly evolving healthcare landscape, artificial intelligence (AI), particularly the large language models (LLMs), like OpenAI's Chat Generative Pretrained Transformer (ChatGPT), has shown transformative potential in emergency medicine and critical care. This review article highlights the advancement and applications of ChatGPT, from diagnostic assistance to clinical documentation and patient communication, demonstrating its ability to perform comparably to human professionals in medical examinations. ChatGPT could assist clinical decision-making and medication selection in critical care, showcasing its potential to optimize patient care management. However, integrating LLMs into healthcare raises legal, ethical, and privacy concerns, including data protection and the necessity for informed consent. Finally, we addressed the challenges related to the accuracy of LLMs, such as the risk of providing incorrect medical advice. These concerns underscore the importance of ongoing research and regulation to ensure their ethical and practical use in healthcare.
Collapse
Affiliation(s)
- Haw Hwai
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| | - Yi-Ju Ho
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| | - Chih-Hung Wang
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| | - Chien-Hua Huang
- Department of Emergency Medicine, National Taiwan University Hospital, National Taiwan University Medical College, Taipei, Taiwan.
| |
Collapse
|
67
|
Xu T, Weng H, Liu F, Yang L, Luo Y, Ding Z, Wang Q. Current Status of ChatGPT Use in Medical Education: Potentials, Challenges, and Strategies. J Med Internet Res 2024; 26:e57896. [PMID: 39196640 PMCID: PMC11391159 DOI: 10.2196/57896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 06/05/2024] [Accepted: 06/29/2024] [Indexed: 08/29/2024] Open
Abstract
ChatGPT, a generative pretrained transformer, has garnered global attention and sparked discussions since its introduction on November 30, 2022. However, it has generated controversy within the realms of medical education and scientific research. This paper examines the potential applications, limitations, and strategies for using ChatGPT. ChatGPT offers personalized learning support to medical students through its robust natural language generation capabilities, enabling it to furnish answers. Moreover, it has demonstrated significant use in simulating clinical scenarios, facilitating teaching and learning processes, and revitalizing medical education. Nonetheless, numerous challenges accompany these advancements. In the context of education, it is of paramount importance to prevent excessive reliance on ChatGPT and combat academic plagiarism. Likewise, in the field of medicine, it is vital to guarantee the timeliness, accuracy, and reliability of content generated by ChatGPT. Concurrently, ethical challenges and concerns regarding information security arise. In light of these challenges, this paper proposes targeted strategies for addressing them. First, the risk of overreliance on ChatGPT and academic plagiarism must be mitigated through ideological education, fostering comprehensive competencies, and implementing diverse evaluation criteria. The integration of contemporary pedagogical methodologies in conjunction with the use of ChatGPT serves to enhance the overall quality of medical education. To enhance the professionalism and reliability of the generated content, it is recommended to implement measures to optimize ChatGPT's training data professionally and enhance the transparency of the generation process. This ensures that the generated content is aligned with the most recent standards of medical practice. Moreover, the enhancement of value alignment and the establishment of pertinent legislation or codes of practice address ethical concerns, including those pertaining to algorithmic discrimination, the allocation of medical responsibility, privacy, and security. In conclusion, while ChatGPT presents significant potential in medical education, it also encounters various challenges. Through comprehensive research and the implementation of suitable strategies, it is anticipated that ChatGPT's positive impact on medical education will be harnessed, laying the groundwork for advancing the discipline and fostering the development of high-caliber medical professionals.
Collapse
Affiliation(s)
- Tianhui Xu
- Clinical Nursing Teaching and Research Section, The Second Xiangya Hospital of Central South University, Changsha, China
- Xiangya School of Nursing, Central South University, Changsha, China
| | - Huiting Weng
- Clinical Nursing Teaching and Research Section, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Fang Liu
- Clinical Nursing Teaching and Research Section, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Li Yang
- Clinical Nursing Teaching and Research Section, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Yuanyuan Luo
- Xiangya School of Nursing, Central South University, Changsha, China
| | - Ziwei Ding
- Xiangya School of Nursing, Central South University, Changsha, China
| | - Qin Wang
- Clinical Nursing Teaching and Research Section, The Second Xiangya Hospital of Central South University, Changsha, China
- Xiangya School of Nursing, Central South University, Changsha, China
| |
Collapse
|
68
|
Alnaimat F, Al-Halaseh S, AlSamhori ARF. Evolution of Research Reporting Standards: Adapting to the Influence of Artificial Intelligence, Statistics Software, and Writing Tools. J Korean Med Sci 2024; 39:e231. [PMID: 39164055 PMCID: PMC11333804 DOI: 10.3346/jkms.2024.39.e231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Accepted: 07/01/2024] [Indexed: 08/22/2024] Open
Abstract
Reporting standards are essential to health research as they improve accuracy and transparency. Over time, significant changes have occurred to the requirements for reporting research to ensure comprehensive and transparent reporting across a range of study domains and foster methodological rigor. The establishment of the Declaration of Helsinki, Consolidated Standards of Reporting Trials (CONSORT), Strengthening the Reporting of Observational Studies in Epidemiology (STROBE), and Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) are just a few of the historic initiatives that have increased research transparency. Through enhanced discoverability, statistical analysis facilitation, article quality enhancement, and language barrier reduction, artificial intelligence (AI)-in particular, large language models like ChatGPT-has transformed academic writing. However, problems with errors that could occur and the need for transparency while utilizing AI tools still exist. Modifying reporting rules to include AI-driven writing tools such as ChatGPT is ethically and practically challenging. In academic writing, precautions for truth, privacy, and responsibility are necessary due to concerns about biases, openness, data limits, and potential legal ramifications. The CONSORT-AI and Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT)-AI Steering Group expands the CONSORT guidelines for AI clinical trials-new checklists like METRICS and CLEAR help to promote transparency in AI studies. Responsible usage of technology in research and writing software adoption requires interdisciplinary collaboration and ethical assessment. This study explores the impact of AI technologies, specifically ChatGPT, on past reporting standards and the need for revised guidelines for open, reproducible, and robust scientific publications.
Collapse
Affiliation(s)
- Fatima Alnaimat
- Division of Rheumatology, Department of Internal Medicine, School of Medicine, University of Jordan, Amman, Jordan.
| | - Salameh Al-Halaseh
- Department of Internal Medicine, School of Medicine, University of Jordan, Amman, Jordan
| | | |
Collapse
|
69
|
Zhui L, Fenghe L, Xuehu W, Qining F, Wei R. Ethical Considerations and Fundamental Principles of Large Language Models in Medical Education: Viewpoint. J Med Internet Res 2024; 26:e60083. [PMID: 38971715 PMCID: PMC11327620 DOI: 10.2196/60083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 07/06/2024] [Indexed: 07/08/2024] Open
Abstract
This viewpoint article first explores the ethical challenges associated with the future application of large language models (LLMs) in the context of medical education. These challenges include not only ethical concerns related to the development of LLMs, such as artificial intelligence (AI) hallucinations, information bias, privacy and data risks, and deficiencies in terms of transparency and interpretability but also issues concerning the application of LLMs, including deficiencies in emotional intelligence, educational inequities, problems with academic integrity, and questions of responsibility and copyright ownership. This paper then analyzes existing AI-related legal and ethical frameworks and highlights their limitations with regard to the application of LLMs in the context of medical education. To ensure that LLMs are integrated in a responsible and safe manner, the authors recommend the development of a unified ethical framework that is specifically tailored for LLMs in this field. This framework should be based on 8 fundamental principles: quality control and supervision mechanisms; privacy and data protection; transparency and interpretability; fairness and equal treatment; academic integrity and moral norms; accountability and traceability; protection and respect for intellectual property; and the promotion of educational research and innovation. The authors further discuss specific measures that can be taken to implement these principles, thereby laying a solid foundation for the development of a comprehensive and actionable ethical framework. Such a unified ethical framework based on these 8 fundamental principles can provide clear guidance and support for the application of LLMs in the context of medical education. This approach can help establish a balance between technological advancement and ethical safeguards, thereby ensuring that medical education can progress without compromising the principles of fairness, justice, or patient safety and establishing a more equitable, safer, and more efficient environment for medical education.
Collapse
Affiliation(s)
- Li Zhui
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Li Fenghe
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Wang Xuehu
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Fu Qining
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Ren Wei
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
70
|
Langston E, Charness N, Boot W. Are Virtual Assistants Trustworthy for Medicare Information: An Examination of Accuracy and Reliability. THE GERONTOLOGIST 2024; 64:gnae062. [PMID: 38832398 PMCID: PMC11258897 DOI: 10.1093/geront/gnae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND AND OBJECTIVES Advances in artificial intelligence (AI)-based virtual assistants provide a potential opportunity for older adults to use this technology in the context of health information-seeking. Meta-analysis on trust in AI shows that users are influenced by the accuracy and reliability of the AI trustee. We evaluated these dimensions for responses to Medicare queries. RESEARCH DESIGN AND METHODS During the summer of 2023, we assessed the accuracy and reliability of Alexa, Google Assistant, Bard, and ChatGPT-4 on Medicare terminology and general content from a large, standardized question set. We compared the accuracy of these AI systems to that of a large representative sample of Medicare beneficiaries who were queried twenty years prior. RESULTS Alexa and Google Assistant were found to be highly inaccurate when compared to beneficiaries' mean accuracy of 68.4% on terminology queries and 53.0% on general Medicare content. Bard and ChatGPT-4 answered Medicare terminology queries perfectly and performed much better on general Medicare content queries (Bard = 96.3%, ChatGPT-4 = 92.6%) than the average Medicare beneficiary. About one month to a month-and-a-half later, we found that Bard and Alexa's accuracy stayed the same, whereas ChatGPT-4's performance nominally decreased, and Google Assistant's performance nominally increased. DISCUSSION AND IMPLICATIONS LLM-based assistants generate trustworthy information in response to carefully phrased queries about Medicare, in contrast to Alexa and Google Assistant. Further studies will be needed to determine what factors beyond accuracy and reliability influence the adoption and use of such technology for Medicare decision-making.
Collapse
Affiliation(s)
- Emily Langston
- Department of Psychology, Florida State University, Tallahassee, Florida, USA
| | - Neil Charness
- Department of Psychology, Florida State University, Tallahassee, Florida, USA
| | - Walter Boot
- Department of Psychology, Florida State University, Tallahassee, Florida, USA
| |
Collapse
|
71
|
Su Z, Tang G, Huang R, Qiao Y, Zhang Z, Dai X. Based on Medicine, The Now and Future of Large Language Models. Cell Mol Bioeng 2024; 17:263-277. [PMID: 39372551 PMCID: PMC11450117 DOI: 10.1007/s12195-024-00820-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 09/08/2024] [Indexed: 10/08/2024] Open
Abstract
Objectives This review explores the potential applications of large language models (LLMs) such as ChatGPT, GPT-3.5, and GPT-4 in the medical field, aiming to encourage their prudent use, provide professional support, and develop accessible medical AI tools that adhere to healthcare standards. Methods This paper examines the impact of technologies such as OpenAI's Generative Pre-trained Transformers (GPT) series, including GPT-3.5 and GPT-4, and other large language models (LLMs) in medical education, scientific research, clinical practice, and nursing. Specifically, it includes supporting curriculum design, acting as personalized learning assistants, creating standardized simulated patient scenarios in education; assisting with writing papers, data analysis, and optimizing experimental designs in scientific research; aiding in medical imaging analysis, decision-making, patient education, and communication in clinical practice; and reducing repetitive tasks, promoting personalized care and self-care, providing psychological support, and enhancing management efficiency in nursing. Results LLMs, including ChatGPT, have demonstrated significant potential and effectiveness in the aforementioned areas, yet their deployment in healthcare settings is fraught with ethical complexities, potential lack of empathy, and risks of biased responses. Conclusion Despite these challenges, significant medical advancements can be expected through the proper use of LLMs and appropriate policy guidance. Future research should focus on overcoming these barriers to ensure the effective and ethical application of LLMs in the medical field.
Collapse
Affiliation(s)
- Ziqing Su
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Guozhang Tang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The Second Clinical College of Anhui Medical University, Hefei, 230032 Anhui P.R. China
| | - Rui Huang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Yang Qiao
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
| | - Zheng Zhang
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Clinical Medicine, The First Clinical College of Anhui Medical University, Hefei, 230022 P.R. China
| | - Xingliang Dai
- Department of Neurosurgery, The First Affiliated Hospital of Anhui Medical University, 218 Jixi Road, Hefei, 230022 P.R. China
- Department of Research & Development, East China Institute of Digital Medical Engineering, Shangrao, 334000 P.R. China
| |
Collapse
|
72
|
Ward M, Unadkat P, Toscano D, Kashanian A, Lynch DG, Horn AC, D'Amico RS, Mittler M, Baum GR. A Quantitative Assessment of ChatGPT as a Neurosurgical Triaging Tool. Neurosurgery 2024; 95:487-495. [PMID: 38353523 DOI: 10.1227/neu.0000000000002867] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 12/26/2023] [Indexed: 07/16/2024] Open
Abstract
BACKGROUND AND OBJECTIVES ChatGPT is a natural language processing chatbot with increasing applicability to the medical workflow. Although ChatGPT has been shown to be capable of passing the American Board of Neurological Surgery board examination, there has never been an evaluation of the chatbot in triaging and diagnosing novel neurosurgical scenarios without defined answer choices. In this study, we assess ChatGPT's capability to determine the emergent nature of neurosurgical scenarios and make diagnoses based on information one would find in a neurosurgical consult. METHODS Thirty clinical scenarios were given to 3 attendings, 4 residents, 2 physician assistants, and 2 subinterns. Participants were asked to determine if the scenario constituted an urgent neurosurgical consultation and what the most likely diagnosis was. Attending responses provided a consensus to use as the answer key. Generative pretraining transformer (GPT) 3.5 and GPT 4 were given the same questions, and their responses were compared with the other participants. RESULTS GPT 4 was 100% accurate in both diagnosis and triage of the scenarios. GPT 3.5 had an accuracy of 92.59%, slightly below that of a PGY1 (96.3%), an 88.24% sensitivity, 100% specificity, 100% positive predictive value, and 83.3% negative predicative value in triaging each situation. When making a diagnosis, GPT 3.5 had an accuracy of 92.59%, which was higher than the subinterns and similar to resident responders. CONCLUSION GPT 4 is able to diagnose and triage neurosurgical scenarios at the level of a senior neurosurgical resident. There has been a clear improvement between GPT 3.5 and 4. It is likely that the recent updates in internet access and directing the functionality of ChatGPT will further improve its utility in neurosurgical triage.
Collapse
Affiliation(s)
- Max Ward
- Department of Neurological Surgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead , New York , USA
| | - Prashin Unadkat
- Department of Neurological Surgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead , New York , USA
- Elmezzi Graduate School of Molecular Medicine, Feinstein Institutes of Medical Research, Northwell Health, Manhasset , New York , USA
| | - Daniel Toscano
- Department of Neurological Surgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead , New York , USA
| | - Alon Kashanian
- Department of Neurological Surgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead , New York , USA
| | - Daniel G Lynch
- Department of Neurological Surgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead , New York , USA
| | - Alexander C Horn
- Department of Neurological Surgery, Wake Forest School of Medicine, Winston-Salem , North Carolina , USA
| | - Randy S D'Amico
- Department of Neurological Surgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead , New York , USA
- Department of Neurological Surgery, Lenox Hill Hospital, New York , New York , USA
| | - Mark Mittler
- Department of Neurological Surgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead , New York , USA
- Department of Pediatric Neurosurgery, Cohens Childrens Medical Center, Queens , New York , USA
| | - Griffin R Baum
- Department of Neurological Surgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead , New York , USA
- Department of Neurological Surgery, Lenox Hill Hospital, New York , New York , USA
| |
Collapse
|
73
|
Mandal S, Chakraborty S, Tariq MA, Ali K, Elavia Z, Khan MK, Garcia DB, Ali S, Al Hooti J, Kumar DV. Artificial Intelligence and Deep Learning in Revolutionizing Brain Tumor Diagnosis and Treatment: A Narrative Review. Cureus 2024; 16:e66157. [PMID: 39233936 PMCID: PMC11372433 DOI: 10.7759/cureus.66157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/05/2024] [Indexed: 09/06/2024] Open
Abstract
The emergence of artificial intelligence (AI) in the medical field holds promise in improving medical management, particularly in personalized strategies for the diagnosis and treatment of brain tumors. However, integrating AI into clinical practice has proven to be a challenge. Deep learning (DL) is very convenient for extracting relevant information from large amounts of data that has increased in medical history and imaging records, which shortens diagnosis time, that would otherwise overwhelm manual methods. In addition, DL aids in automated tumor segmentation, classification, and diagnosis. DL models such as the Brain Tumor Classification Model and the Inception-Resnet V2, or hybrid techniques that enhance these functions and combine DL networks with support vector machine and k-nearest neighbors, identify tumor phenotypes and brain metastases, allowing real-time decision-making and enhancing preoperative planning. AI algorithms and DL development facilitate radiological diagnostics such as computed tomography, positron emission tomography scans, and magnetic resonance imaging (MRI) by integrating two-dimensional and three-dimensional MRI using DenseNet and 3D convolutional neural network architectures, which enable precise tumor delineation. DL offers benefits in neuro-interventional procedures, and the shift toward computer-assisted interventions acknowledges the need for more accurate and efficient image analysis methods. Further research is needed to realize the potential impact of DL in improving these outcomes.
Collapse
Affiliation(s)
- Shobha Mandal
- Internal Medicine, Guthrie Robert Packer Hospital, Sayre, USA
| | - Subhadeep Chakraborty
- Electronics and Communication, Maulana Abul Kalam Azad University of Technology, West Bengal, IND
| | | | - Kamran Ali
- Internal Medicine, United Medical and Dental College, Karachi, PAK
| | - Zenia Elavia
- Medical School, Dr. D. Y. Patil Medical College, Hospital & Research Centre, Pune, IND
| | - Misbah Kamal Khan
- Internal Medicine, Peoples University of Medical and Health Sciences, Nawabshah, PAK
| | | | - Sofia Ali
- Medical School, Peninsula Medical School, Plymouth, GBR
| | | | - Divyanshi Vijay Kumar
- Internal Medicine, Smt. Nathiba Hargovandas Lakhmichand Municipal Medical College, Ahmedabad, IND
| |
Collapse
|
74
|
Bektaş M, Pereira JK, Daams F, van der Peet DL. ChatGPT in surgery: a revolutionary innovation? Surg Today 2024; 54:964-971. [PMID: 38421439 PMCID: PMC11266448 DOI: 10.1007/s00595-024-02800-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 12/13/2023] [Indexed: 03/02/2024]
Abstract
ChatGPT has brought about a new era of digital health, as this model has become prominent and been rapidly developing since its release. ChatGPT may be able to facilitate improvements in surgery as well; however, the influence of ChatGPT on surgery is largely unknown at present. Therefore, the present study reports on the current applications of ChatGPT in the field of surgery, evaluating its workflow, practical implementations, limitations, and future perspectives. A literature search was performed using the PubMed and Embase databases. The initial search was performed from its inception until July 2023. This study revealed that ChatGPT has promising capabilities in areas of surgical research, education, training, and practice. In daily practice, surgeons and surgical residents can be aided in performing logistics and administrative tasks, and patients can be more efficiently informed about the details of their condition. However, priority should be given to establishing proper policies and protocols to ensure the safe and reliable use of this model.
Collapse
Affiliation(s)
- Mustafa Bektaş
- Amsterdam UMC Location Vrije Universiteit Amsterdam, Surgery, De Boelelaan 1117, Amsterdam, The Netherlands.
| | - Jaime Ken Pereira
- Department of Computer Science, Vrije Universiteit Amsterdam, De Boelelaan 1105, Amsterdam, The Netherlands
| | - Freek Daams
- Amsterdam UMC Location Vrije Universiteit Amsterdam, Surgery, De Boelelaan 1117, Amsterdam, The Netherlands
| | - Donald L van der Peet
- Amsterdam UMC Location Vrije Universiteit Amsterdam, Surgery, De Boelelaan 1117, Amsterdam, The Netherlands
| |
Collapse
|
75
|
Luo X, Tahabi FM, Marc T, Haunert LA, Storey S. Zero-shot learning to extract assessment criteria and medical services from the preventive healthcare guidelines using large language models. J Am Med Inform Assoc 2024; 31:1743-1753. [PMID: 38900185 PMCID: PMC11258407 DOI: 10.1093/jamia/ocae145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 04/30/2024] [Accepted: 06/03/2024] [Indexed: 06/21/2024] Open
Abstract
OBJECTIVES The integration of these preventive guidelines with Electronic Health Records (EHRs) systems, coupled with the generation of personalized preventive care recommendations, holds significant potential for improving healthcare outcomes. Our study investigates the feasibility of using Large Language Models (LLMs) to automate the assessment criteria and risk factors from the guidelines for future analysis against medical records in EHR. MATERIALS AND METHODS We annotated the criteria, risk factors, and preventive medical services described in the adult guidelines published by United States Preventive Services Taskforce and evaluated 3 state-of-the-art LLMs on extracting information in these categories from the guidelines automatically. RESULTS We included 24 guidelines in this study. The LLMs can automate the extraction of all criteria, risk factors, and medical services from 9 guidelines. All 3 LLMs perform well on extracting information regarding the demographic criteria or risk factors. Some LLMs perform better on extracting the social determinants of health, family history, and preventive counseling services than the others. DISCUSSION While LLMs demonstrate the capability to handle lengthy preventive care guidelines, several challenges persist, including constraints related to the maximum length of input tokens and the tendency to generate content rather than adhering strictly to the original input. Moreover, the utilization of LLMs in real-world clinical settings necessitates careful ethical consideration. It is imperative that healthcare professionals meticulously validate the extracted information to mitigate biases, ensure completeness, and maintain accuracy. CONCLUSION We developed a data structure to store the annotated preventive guidelines and make it publicly available. Employing state-of-the-art LLMs to extract preventive care criteria, risk factors, and preventive care services paves the way for the future integration of these guidelines into the EHR.
Collapse
Affiliation(s)
- Xiao Luo
- Department of Management Science and Information Systems, Spears School of Business, Oklahoma State University, Stillwater, OK 74078, United States
- Department of Biostatistics and Health Data Science, School of Medicine, Indiana University, Indianapolis, IN 46202, United States
| | - Fattah Muhammad Tahabi
- Department of Management Science and Information Systems, Spears School of Business, Oklahoma State University, Stillwater, OK 74078, United States
| | - Tressica Marc
- Department of Computer Information Technology, Purdue School of Engineering and Technology, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, United States
| | - Laura Ann Haunert
- School of Nursing, Indiana University, Indianapolis, IN 46202, United States
| | - Susan Storey
- School of Nursing, Indiana University, Indianapolis, IN 46202, United States
| |
Collapse
|
76
|
Zhui L, Yhap N, Liping L, Zhengjie W, Zhonghao X, Xiaoshu Y, Hong C, Xuexiu L, Wei R. Impact of Large Language Models on Medical Education and Teaching Adaptations. JMIR Med Inform 2024; 12:e55933. [PMID: 39087590 PMCID: PMC11294775 DOI: 10.2196/55933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 04/25/2024] [Accepted: 06/08/2024] [Indexed: 08/02/2024] Open
Abstract
Unlabelled This viewpoint article explores the transformative role of large language models (LLMs) in the field of medical education, highlighting their potential to enhance teaching quality, promote personalized learning paths, strengthen clinical skills training, optimize teaching assessment processes, boost the efficiency of medical research, and support continuing medical education. However, the use of LLMs entails certain challenges, such as questions regarding the accuracy of information, the risk of overreliance on technology, a lack of emotional recognition capabilities, and concerns related to ethics, privacy, and data security. This article emphasizes that to maximize the potential of LLMs and overcome these challenges, educators must exhibit leadership in medical education, adjust their teaching strategies flexibly, cultivate students' critical thinking, and emphasize the importance of practical experience, thus ensuring that students can use LLMs correctly and effectively. By adopting such a comprehensive and balanced approach, educators can train health care professionals who are proficient in the use of advanced technologies and who exhibit solid professional ethics and practical skills, thus laying a strong foundation for these professionals to overcome future challenges in the health care sector.
Collapse
Affiliation(s)
- Li Zhui
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Nina Yhap
- Department of General Surgery, Queen Elizabeth Hospital, St Michael, Barbados
| | - Liu Liping
- Department of Ultrasound, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Wang Zhengjie
- Department of Nuclear Medicine, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Xiong Zhonghao
- Department of Acupuncture and Moxibustion, Chongqing Traditional Chinese Medicine Hospital, Chongqing, China
| | - Yuan Xiaoshu
- Department of Anesthesia, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Cui Hong
- Department of Anesthesia, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Liu Xuexiu
- Department of Neonatology, Children’s Hospital of Chongqing Medical University, Chongqing, China
| | - Ren Wei
- Department of Vascular Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
77
|
Bijker R, Merkouris SS, Dowling NA, Rodda SN. ChatGPT for Automated Qualitative Research: Content Analysis. J Med Internet Res 2024; 26:e59050. [PMID: 39052327 PMCID: PMC11310599 DOI: 10.2196/59050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 05/08/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open
Abstract
BACKGROUND Data analysis approaches such as qualitative content analysis are notoriously time and labor intensive because of the time to detect, assess, and code a large amount of data. Tools such as ChatGPT may have tremendous potential in automating at least some of the analysis. OBJECTIVE The aim of this study was to explore the utility of ChatGPT in conducting qualitative content analysis through the analysis of forum posts from people sharing their experiences on reducing their sugar consumption. METHODS Inductive and deductive content analysis were performed on 537 forum posts to detect mechanisms of behavior change. Thorough prompt engineering provided appropriate instructions for ChatGPT to execute data analysis tasks. Data identification involved extracting change mechanisms from a subset of forum posts. The precision of the extracted data was assessed through comparison with human coding. On the basis of the identified change mechanisms, coding schemes were developed with ChatGPT using data-driven (inductive) and theory-driven (deductive) content analysis approaches. The deductive approach was informed by the Theoretical Domains Framework using both an unconstrained coding scheme and a structured coding matrix. In total, 10 coding schemes were created from a subset of data and then applied to the full data set in 10 new conversations, resulting in 100 conversations each for inductive and unconstrained deductive analysis. A total of 10 further conversations coded the full data set into the structured coding matrix. Intercoder agreement was evaluated across and within coding schemes. ChatGPT output was also evaluated by the researchers to assess whether it reflected prompt instructions. RESULTS The precision of detecting change mechanisms in the data subset ranged from 66% to 88%. Overall κ scores for intercoder agreement ranged from 0.72 to 0.82 across inductive coding schemes and from 0.58 to 0.73 across unconstrained coding schemes and structured coding matrix. Coding into the best-performing coding scheme resulted in category-specific κ scores ranging from 0.67 to 0.95 for the inductive approach and from 0.13 to 0.87 for the deductive approaches. ChatGPT largely followed prompt instructions in producing a description of each coding scheme, although the wording for the inductively developed coding schemes was lengthier than specified. CONCLUSIONS ChatGPT appears fairly reliable in assisting with qualitative analysis. ChatGPT performed better in developing an inductive coding scheme that emerged from the data than adapting an existing framework into an unconstrained coding scheme or coding directly into a structured matrix. The potential for ChatGPT to act as a second coder also appears promising, with almost perfect agreement in at least 1 coding scheme. The findings suggest that ChatGPT could prove useful as a tool to assist in each phase of qualitative content analysis, but multiple iterations are required to determine the reliability of each stage of analysis.
Collapse
Affiliation(s)
- Rimke Bijker
- Department of Psychology and Neuroscience, Auckland University of Technology, Auckland, New Zealand
| | | | | | - Simone N Rodda
- Department of Psychology and Neuroscience, Auckland University of Technology, Auckland, New Zealand
- School of Psychology, Deakin University, Burwood, Australia
| |
Collapse
|
78
|
Chen X, Wang L, You M, Liu W, Fu Y, Xu J, Zhang S, Chen G, Li K, Li J. Evaluating and Enhancing Large Language Models' Performance in Domain-Specific Medicine: Development and Usability Study With DocOA. J Med Internet Res 2024; 26:e58158. [PMID: 38833165 PMCID: PMC11301122 DOI: 10.2196/58158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/28/2024] [Accepted: 06/03/2024] [Indexed: 06/06/2024] Open
Abstract
BACKGROUND The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. OBJECTIVE This study focused on evaluating and enhancing the clinical capabilities and explainability of LLMs in specific domains, using OA management as a case study. METHODS A domain-specific benchmark framework was developed to evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM designed for OA management integrating retrieval-augmented generation and instructional prompts, was developed. It can identify the clinical evidence upon which its answers are based through retrieval-augmented generation, thereby demonstrating the explainability of those answers. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. RESULTS Results showed that general LLMs such as GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. CONCLUSIONS This study introduces a novel benchmark framework that assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs.
Collapse
Affiliation(s)
- Xi Chen
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China
| | - Li Wang
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China
| | - MingKe You
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China
| | - WeiZhi Liu
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China
| | - Yu Fu
- West China Hospital, West China School of Medicine, Sichuan University, Chengdu, China
| | - Jie Xu
- Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China
| | - Shaoting Zhang
- Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China
| | - Gang Chen
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China
| | - Kang Li
- Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Jian Li
- Sports Medicine Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Orthopedics and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
79
|
Choi J, Oh AR, Park J, Kang RA, Yoo SY, Lee DJ, Yang K. Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0. Front Med (Lausanne) 2024; 11:1400153. [PMID: 39055693 PMCID: PMC11269144 DOI: 10.3389/fmed.2024.1400153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 07/01/2024] [Indexed: 07/27/2024] Open
Abstract
Introduction The large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures. Methods Two anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0. Results Regarding quality, "appropriate" was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed "insufficient" in 59% of cases for 3.5, and "adequate" in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were - 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas. Conclusion ChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.
Collapse
Affiliation(s)
- Jisun Choi
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Ah Ran Oh
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jungchan Park
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Ryung A. Kang
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Seung Yeon Yoo
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Dong Jae Lee
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Kwangmo Yang
- Center for Health Promotion, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| |
Collapse
|
80
|
Haltaufderheide J, Ranisch R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ Digit Med 2024; 7:183. [PMID: 38977771 PMCID: PMC11231310 DOI: 10.1038/s41746-024-01157-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/29/2024] [Indexed: 07/10/2024] Open
Abstract
With the introduction of ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare. Despite potential benefits, researchers have underscored various ethical implications. While individual instances have garnered attention, a systematic and comprehensive overview of practical applications currently researched and ethical issues connected to them is lacking. Against this background, this work maps the ethical landscape surrounding the current deployment of LLMs in medicine and healthcare through a systematic review. Electronic databases and preprint servers were queried using a comprehensive search strategy which generated 796 records. Studies were screened and extracted following a modified rapid review approach. Methodological quality was assessed using a hybrid approach. For 53 records, a meta-aggregative synthesis was performed. Four general fields of applications emerged showcasing a dynamic exploration phase. Advantages of using LLMs are attributed to their capacity in data analysis, information provisioning, support in decision-making or mitigating information loss and enhancing information accessibility. However, our study also identifies recurrent ethical concerns connected to fairness, bias, non-maleficence, transparency, and privacy. A distinctive concern is the tendency to produce harmful or convincing but inaccurate content. Calls for ethical guidance and human oversight are recurrent. We suggest that the ethical guidance debate should be reframed to focus on defining what constitutes acceptable human oversight across the spectrum of applications. This involves considering the diversity of settings, varying potentials for harm, and different acceptable thresholds for performance and certainty in healthcare. Additionally, critical inquiry is needed to evaluate the necessity and justification of LLMs' current experimental use.
Collapse
Affiliation(s)
- Joschka Haltaufderheide
- Faculty of Health Sciences Brandenburg, University of Potsdam, Am Mühlenberg 9, Potsdam, 14476, Germany
| | - Robert Ranisch
- Faculty of Health Sciences Brandenburg, University of Potsdam, Am Mühlenberg 9, Potsdam, 14476, Germany.
| |
Collapse
|
81
|
Chaudhry BM, Debi HR. User perceptions and experiences of an AI-driven conversational agent for mental health support. Mhealth 2024; 10:22. [PMID: 39114462 PMCID: PMC11304096 DOI: 10.21037/mhealth-23-55] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 06/05/2024] [Indexed: 08/10/2024] Open
Abstract
Background The increasing prevalence of artificial intelligence (AI)-driven mental health conversational agents necessitates a comprehensive understanding of user engagement and user perceptions of this technology. This study aims to fill the existing knowledge gap by focusing on Wysa, a commercially available mobile conversational agent designed to provide personalized mental health support. Methods A total of 159 user reviews posted between January, 2020 and March, 2024, on the Wysa app's Google Play page were collected. Thematic analysis was then used to perform open and inductive coding of the collected data. Results Seven major themes emerged from the user reviews: "a trusting environment promotes wellbeing", "ubiquitous access offers real-time support", "AI limitations detract from the user experience", "perceived effectiveness of Wysa", "desire for cohesive and predictable interactions", "humanness in AI is welcomed", and "the need for improvements in the user interface". These themes highlight both the benefits and limitations of the AI-driven mental health conversational agents. Conclusions Users find that Wysa is effective in fostering a strong connection with its users, encouraging them to engage with the app and take positive steps towards emotional resilience and self-improvement. However, its AI needs several improvements to enhance user experience with the application. The findings contribute to the design and implementation of more effective, ethical, and user-aligned AI-driven mental health support systems.
Collapse
Affiliation(s)
- Beenish Moalla Chaudhry
- School of Computing and Informatics, Ray P. Authement College of Sciences, University of Louisiana at Lafayette, Lafayette, LA, USA
| | - Happy Rani Debi
- School of Computing and Informatics, Ray P. Authement College of Sciences, University of Louisiana at Lafayette, Lafayette, LA, USA
| |
Collapse
|
82
|
Edalati S, Vasan V, Cheng CP, Patel Z, Govindaraj S, Iloreta AM. Can GPT-4 revolutionize otolaryngology? Navigating opportunities and ethical considerations. Am J Otolaryngol 2024; 45:104303. [PMID: 38678799 DOI: 10.1016/j.amjoto.2024.104303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 04/14/2024] [Indexed: 05/01/2024]
Abstract
Otolaryngologists can enhance workflow efficiency, provide better patient care, and advance medical research and education by integrating artificial intelligence (AI) into their practices. GPT-4 technology is a revolutionary and contemporary example of AI that may apply to otolaryngology. The knowledge of otolaryngologists should be supplemented, not replaced when using GPT-4 to make critical medical decisions and provide individualized patient care. In our thorough examination, we explore the potential uses of the groundbreaking GPT-4 technology in the field of otolaryngology, covering aspects such as potential outcomes and technical boundaries. Additionally, we delve into the intricate and intellectually challenging dilemmas that emerge when incorporating GPT-4 into otolaryngology, considering the ethical considerations inherent in its implementation. Our stance is that GPT-4 has the potential to be very helpful. Its capabilities, which include aid in clinical decision-making, patient care, and administrative job automation, present exciting possibilities for enhancing patient outcomes, boosting the efficiency of healthcare delivery, and enhancing patient experiences. Even though there are still certain obstacles and limitations, the progress made so far shows that GPT-4 can be a valuable tool for modern medicine. GPT-4 may play a more significant role in clinical practice as technology develops, helping medical professionals deliver high-quality care tailored to every patient's unique needs.
Collapse
Affiliation(s)
- Shaun Edalati
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Vikram Vasan
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Christopher P Cheng
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Zara Patel
- Department of Otolaryngology-Head & Neck Surgery, Stanford University School of Medicine, Stanford, CA, USA
| | - Satish Govindaraj
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alfred Marc Iloreta
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
83
|
Amato I, Simona G. Re: Letter to the Editor: what are the legal and ethical considerations of submitting radiology reports to ChatGPT? Clin Radiol 2024; 79:e982-e983. [PMID: 38719687 DOI: 10.1016/j.crad.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 04/03/2024] [Indexed: 06/02/2024]
Affiliation(s)
- I Amato
- ARC Advanced Radiology Center (ARC), Department of Oncological Radiotherapy, and Hematology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy.
| | - G Simona
- ARC Advanced Radiology Center (ARC), Department of Oncological Radiotherapy, and Hematology, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy; Università Cattolica del Sacro Cuore, Facoltà di Medicina e Chirurgia, Rome, Italy.
| |
Collapse
|
84
|
Tessler I, Wolfovitz A, Alon EE, Gecel NA, Livneh N, Zimlichman E, Klang E. ChatGPT's adherence to otolaryngology clinical practice guidelines. Eur Arch Otorhinolaryngol 2024; 281:3829-3834. [PMID: 38647684 DOI: 10.1007/s00405-024-08634-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]
Abstract
OBJECTIVES Large language models, including ChatGPT, has the potential to transform the way we approach medical knowledge, yet accuracy in clinical topics is critical. Here we assessed ChatGPT's performance in adhering to the American Academy of Otolaryngology-Head and Neck Surgery guidelines. METHODS We presented ChatGPT with 24 clinical otolaryngology questions based on the guidelines of the American Academy of Otolaryngology. This was done three times (N = 72) to test the model's consistency. Two otolaryngologists evaluated the responses for accuracy and relevance to the guidelines. Cohen's Kappa was used to measure evaluator agreement, and Cronbach's alpha assessed the consistency of ChatGPT's responses. RESULTS The study revealed mixed results; 59.7% (43/72) of ChatGPT's responses were highly accurate, while only 2.8% (2/72) directly contradicted the guidelines. The model showed 100% accuracy in Head and Neck, but lower accuracy in Rhinology and Otology/Neurotology (66%), Laryngology (50%), and Pediatrics (8%). The model's responses were consistent in 17/24 (70.8%), with a Cronbach's alpha value of 0.87, indicating a reasonable consistency across tests. CONCLUSIONS Using a guideline-based set of structured questions, ChatGPT demonstrates consistency but variable accuracy in otolaryngology. Its lower performance in some areas, especially Pediatrics, suggests that further rigorous evaluation is needed before considering real-world clinical use.
Collapse
Affiliation(s)
- Idit Tessler
- Department of Otolaryngology and Head and Neck Surgery, Sheba Medical Center, Ramat Gan, Israel.
- School of Medicine, Tel Aviv University, Tel Aviv, Israel.
- ARC Innovation Center, Sheba Medical Center, Ramat Gan, Israel.
| | - Amit Wolfovitz
- Department of Otolaryngology and Head and Neck Surgery, Sheba Medical Center, Ramat Gan, Israel
- School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eran E Alon
- Department of Otolaryngology and Head and Neck Surgery, Sheba Medical Center, Ramat Gan, Israel
- School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Nir A Gecel
- School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Nir Livneh
- Department of Otolaryngology and Head and Neck Surgery, Sheba Medical Center, Ramat Gan, Israel
- School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eyal Zimlichman
- School of Medicine, Tel Aviv University, Tel Aviv, Israel
- ARC Innovation Center, Sheba Medical Center, Ramat Gan, Israel
- The Sheba Talpiot Medical Leadership Program, Ramat Gan, Israel
- Hospital Management, Sheba Medical Center, Ramat Gan, Israel
| | - Eyal Klang
- The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, USA
| |
Collapse
|
85
|
Xu R, Wang Z. Generative artificial intelligence in healthcare from the perspective of digital media: Applications, opportunities and challenges. Heliyon 2024; 10:e32364. [PMID: 38975200 PMCID: PMC11225727 DOI: 10.1016/j.heliyon.2024.e32364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 06/03/2024] [Indexed: 07/09/2024] Open
Abstract
Introduction The emergence and application of generative artificial intelligence/large language models (hereafter GenAI LLMs) have the potential for significant impact on the healthcare industry. However, there is currently a lack of systematic research on GenAI LLMs in healthcare based on reliable data. This article aims to conduct an exploratory study of the application of GenAI LLMs (i.e., ChatGPT) in healthcare from the perspective of digital media (i.e., online news), including the application scenarios, potential opportunities, and challenges. Methods This research used thematic qualitative text analysis in five steps: firstly, developing main topical categories based on relevant articles; secondly, encoding the search keywords using these categories; thirdly, conducting searches for news articles via Google ; fourthly, encoding the sub-categories using the elaborate category system; and finally, conducting category-based analysis and presenting the results. Natural language processing techniques, including the TermRaider and AntConc tool, were applied in the aforementioned steps to assist in text qualitative analysis. Additionally, this study built a framework, using for analyzing the above three topics, from the perspective of five different stakeholders, including healthcare demanders and providers. Results This study summarizes 26 applications (e.g., provide medical advice, provide diagnosis and triage recommendations, provide mental health support, etc.), 21 opportunities (e.g., make healthcare more accessible, reduce healthcare costs, improve patients care, etc.), and 17 challenges (e.g., generate inaccurate/misleading/wrong answers, raise privacy concerns, lack of transparency, etc.), and analyzes the reasons for the formation of these key items and the links between the three research topics. Conclusions The application of GenAI LLMs in healthcare is primarily focused on transforming the way healthcare demanders access medical services (i.e., making it more intelligent, refined, and humane) and optimizing the processes through which healthcare providers offer medical services (i.e., simplifying, ensuring timeliness, and reducing errors). As the application becomes more widespread and deepens, GenAI LLMs is expected to have a revolutionary impact on traditional healthcare service models, but it also inevitably raises ethical and security concerns. Furthermore, GenAI LLMs applied in healthcare is still in the initial stage, which can be accelerated from a specific healthcare field (e.g., mental health) or a specific mechanism (e.g., GenAI LLMs' economic benefits allocation mechanism applied to healthcare) with empirical or clinical research.
Collapse
Affiliation(s)
- Rui Xu
- School of Economics, Guangdong University of Technology, Guangzhou, China
| | - Zhong Wang
- School of Economics, Guangdong University of Technology, Guangzhou, China
- Key Laboratory of Digital Economy and Data Governance, Guangdong University of Technology, Guangzhou, China
| |
Collapse
|
86
|
Wu Y, Wu M, Wang C, Lin J, Liu J, Liu S. Evaluating the Prevalence of Burnout Among Health Care Professionals Related to Electronic Health Record Use: Systematic Review and Meta-Analysis. JMIR Med Inform 2024; 12:e54811. [PMID: 38865188 PMCID: PMC11208837 DOI: 10.2196/54811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 02/23/2024] [Accepted: 04/17/2024] [Indexed: 06/13/2024] Open
Abstract
BACKGROUND Burnout among health care professionals is a significant concern, with detrimental effects on health care service quality and patient outcomes. The use of the electronic health record (EHR) system has been identified as a significant contributor to burnout among health care professionals. OBJECTIVE This systematic review and meta-analysis aims to assess the prevalence of burnout among health care professionals associated with the use of the EHR system, thereby providing evidence to improve health information systems and develop strategies to measure and mitigate burnout. METHODS We conducted a comprehensive search of the PubMed, Embase, and Web of Science databases for English-language peer-reviewed articles published between January 1, 2009, and December 31, 2022. Two independent reviewers applied inclusion and exclusion criteria, and study quality was assessed using the Joanna Briggs Institute checklist and the Newcastle-Ottawa Scale. Meta-analyses were performed using R (version 4.1.3; R Foundation for Statistical Computing), with EndNote X7 (Clarivate) for reference management. RESULTS The review included 32 cross-sectional studies and 5 case-control studies with a total of 66,556 participants, mainly physicians and registered nurses. The pooled prevalence of burnout among health care professionals in cross-sectional studies was 40.4% (95% CI 37.5%-43.2%). Case-control studies indicated a higher likelihood of burnout among health care professionals who spent more time on EHR-related tasks outside work (odds ratio 2.43, 95% CI 2.31-2.57). CONCLUSIONS The findings highlight the association between the increased use of the EHR system and burnout among health care professionals. Potential solutions include optimizing EHR systems, implementing automated dictation or note-taking, employing scribes to reduce documentation burden, and leveraging artificial intelligence to enhance EHR system efficiency and reduce the risk of burnout. TRIAL REGISTRATION PROSPERO International Prospective Register of Systematic Reviews CRD42021281173; https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021281173.
Collapse
Affiliation(s)
- Yuxuan Wu
- Department of Medical Informatics, West China Hospital, Sichuan University, Chengdu, China
| | - Mingyue Wu
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| | - Changyu Wang
- West China College of Stomatology, Sichuan University, Chengdu, China
| | - Jie Lin
- Department of Oral Implantology, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Jialin Liu
- Department of Medical Informatics, West China Hospital, Sichuan University, Chengdu, China
- Information Center, West China Hospital, Sichuan University, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
87
|
Maggio MG, Tartarisco G, Cardile D, Bonanno M, Bruschetta R, Pignolo L, Pioggia G, Calabrò RS, Cerasa A. Exploring ChatGPT's potential in the clinical stream of neurorehabilitation. Front Artif Intell 2024; 7:1407905. [PMID: 38903157 PMCID: PMC11187276 DOI: 10.3389/frai.2024.1407905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 05/13/2024] [Indexed: 06/22/2024] Open
Abstract
In several medical fields, generative AI tools such as ChatGPT have achieved optimal performance in identifying correct diagnoses only by evaluating narrative clinical descriptions of cases. The most active fields of application include oncology and COVID-19-related symptoms, with preliminary relevant results also in psychiatric and neurological domains. This scoping review aims to introduce the arrival of ChatGPT applications in neurorehabilitation practice, where such AI-driven solutions have the potential to revolutionize patient care and assistance. First, a comprehensive overview of ChatGPT, including its design, and potential applications in medicine is provided. Second, the remarkable natural language processing skills and limitations of these models are examined with a focus on their use in neurorehabilitation. In this context, we present two case scenarios to evaluate ChatGPT ability to resolve higher-order clinical reasoning. Overall, we provide support to the first evidence that generative AI can meaningfully integrate as a facilitator into neurorehabilitation practice, aiding physicians in defining increasingly efficacious diagnostic and personalized prognostic plans.
Collapse
Affiliation(s)
| | - Gennaro Tartarisco
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | | | - Roberta Bruschetta
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | - Giovanni Pioggia
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
| | | | - Antonio Cerasa
- Institute for Biomedical Research and Innovation (IRIB), National Research Council of Italy (CNR), Messina, Italy
- S’Anna Institute, Crotone, Italy
- Pharmacotechnology Documentation and Transfer Unit, Preclinical and Translational Pharmacology, Department of Pharmacy, Health and Nutritional Sciences, University of Calabria, Rende, Italy
| |
Collapse
|
88
|
Treviño-Juarez AS. Assessing Risk of Bias Using ChatGPT-4 and Cochrane ROB2 Tool. MEDICAL SCIENCE EDUCATOR 2024; 34:691-694. [PMID: 38887420 PMCID: PMC11180068 DOI: 10.1007/s40670-024-02034-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/29/2024] [Indexed: 06/20/2024]
Abstract
In the world of evidence-based medicine, systematic reviews have long been the gold standard. But they have had a problem-they take forever. That is where ChatGPT-4 and automation come in. They are like a breath of fresh air, speeding things up and making the process more reliable. ChatGPT-4 is like having a super-smart assistant who can quickly assess bias risk in research studies. It is a game-changer, especially in a field where getting the latest research quickly can mean life or death for patients. Sure, it is not perfect, and we still need humans to keep an eye on things and ensure everything's ethical. But the future looks bright. With ChatGPT-4 and automation, evidence-based medicine is on the fast track to success.
Collapse
|
89
|
Li J, Tang T, Wu E, Zhao J, Zong H, Wu R, Feng W, Zhang K, Wang D, Qin Y, Shen Z, Qin Y, Ren S, Zhan C, Yang L, Wei Q, Shen B. RARPKB: a knowledge-guide decision support platform for personalized robot-assisted surgery in prostate cancer. Int J Surg 2024; 110:3412-3424. [PMID: 38498357 PMCID: PMC11175739 DOI: 10.1097/js9.0000000000001290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/22/2024] [Indexed: 03/20/2024]
Abstract
BACKGROUND Robot-assisted radical prostatectomy (RARP) has emerged as a pivotal surgical intervention for the treatment of prostate cancer (PCa). However, the complexity of clinical cases, heterogeneity of PCa, and limitations in physician expertise pose challenges to rational decision-making in RARP. To address these challenges, the authors aimed to organize the knowledge of previously complex cohorts and establish an online platform named the RARP knowledge base (RARPKB) to provide reference evidence for personalized treatment plans. MATERIALS AND METHODS PubMed searches over the past two decades were conducted to identify publications describing RARP. The authors collected, classified, and structured surgical details, patient information, surgical data, and various statistical results from the literature. A knowledge-guided decision-support tool was established using MySQL, DataTable, ECharts, and JavaScript. ChatGPT-4 and two assessment scales were used to validate and compare the platform. RESULTS The platform comprised 583 studies, 1589 cohorts, 1 911 968 patients, and 11 986 records, resulting in 54 834 data entries. The knowledge-guided decision support tool provide personalized surgical plan recommendations and potential complications on the basis of patients' baseline and surgical information. Compared with ChatGPT-4, RARPKB outperformed in authenticity (100% vs. 73%), matching (100% vs. 53%), personalized recommendations (100% vs. 20%), matching of patients (100% vs. 0%), and personalized recommendations for complications (100% vs. 20%). Postuse, the average System Usability Scale score was 88.88±15.03, and the Net Promoter Score of RARPKB was 85. The knowledge base is available at: http://rarpkb.bioinf.org.cn . CONCLUSIONS The authors introduced the pioneering RARPKB, the first knowledge base for robot-assisted surgery, with an emphasis on PCa. RARPKB can assist in personalized and complex surgical planning for PCa to improve its efficacy. RARPKB provides a reference for the future applications of artificial intelligence in clinical practice.
Collapse
Affiliation(s)
- Jiakun Li
- Department of Urology, West China Hospital, Sichuan University
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
| | - Tong Tang
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
- Department of Computer Science and Information Technologies, Elviña Campus, University of A Coruña, A Coruña, Spain
| | - Erman Wu
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
| | - Jing Zhao
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
| | - Hui Zong
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
| | - Rongrong Wu
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
| | - Weizhe Feng
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
| | - Ke Zhang
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
- Chengdu Aixam Medical Technology Co. Ltd, Chengdu
| | - Dongyue Wang
- Department of Ophthalmology, West China Hospital, Sichuan University
| | - Yawen Qin
- Clinical Medical College, Southwest Medical University, Luzhou, Sichuan Province
| | | | - Yi Qin
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
| | - Shumin Ren
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
- Department of Computer Science and Information Technologies, Elviña Campus, University of A Coruña, A Coruña, Spain
| | - Chaoying Zhan
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
| | - Lu Yang
- Department of Urology, West China Hospital, Sichuan University
| | - Qiang Wei
- Department of Urology, West China Hospital, Sichuan University
| | - Bairong Shen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University
| |
Collapse
|
90
|
Liu S, McCoy AB, Wright AP, Carew B, Genkins JZ, Huang SS, Peterson JF, Steitz B, Wright A. Leveraging large language models for generating responses to patient messages-a subjective analysis. J Am Med Inform Assoc 2024; 31:1367-1379. [PMID: 38497958 PMCID: PMC11105129 DOI: 10.1093/jamia/ocae052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/17/2024] [Accepted: 02/28/2024] [Indexed: 03/19/2024] Open
Abstract
OBJECTIVE This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal. MATERIALS AND METHODS Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate fine-tuned models, we used 10 representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness. RESULTS The dataset consisted of 499 794 pairs of patient messages and corresponding responses from the patient portal, with 5000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider's responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT's responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness. CONCLUSION This subjective analysis suggests that leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and healthcare providers.
Collapse
Affiliation(s)
- Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| | - Allison B McCoy
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| | - Aileen P Wright
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| | - Babatunde Carew
- Department of General Internal Medicine and Public Health, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| | - Julian Z Genkins
- Department of Medicine, Stanford University, Stanford, CA 94304, United States
| | - Sean S Huang
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| | - Josh F Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| | - Bryan Steitz
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| | - Adam Wright
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States
| |
Collapse
|
91
|
Mohammad-Rahimi H, Khoury ZH, Alamdari MI, Rokhshad R, Motie P, Parsa A, Tavares T, Sciubba JJ, Price JB, Sultan AS. Performance of AI chatbots on controversial topics in oral medicine, pathology, and radiology. Oral Surg Oral Med Oral Pathol Oral Radiol 2024; 137:508-514. [PMID: 38553304 DOI: 10.1016/j.oooo.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/20/2023] [Accepted: 01/25/2024] [Indexed: 06/19/2024]
Abstract
OBJECTIVES In this study, we assessed 6 different artificial intelligence (AI) chatbots (Bing, GPT-3.5, GPT-4, Google Bard, Claude, Sage) responses to controversial and difficult questions in oral pathology, oral medicine, and oral radiology. STUDY DESIGN The chatbots' answers were evaluated by board-certified specialists using a modified version of the global quality score on a 5-point Likert scale. The quality and validity of chatbot citations were evaluated. RESULTS Claude had the highest mean score of 4.341 ± 0.582 for oral pathology and medicine. Bing had the lowest scores of 3.447 ± 0.566. In oral radiology, GPT-4 had the highest mean score of 3.621 ± 1.009 and Bing the lowest score of 2.379 ± 0.978. GPT-4 achieved the highest mean score of 4.066 ± 0.825 for performance across all disciplines. 82 out of 349 (23.50%) of generated citations from chatbots were fake. CONCLUSIONS The most superior chatbot in providing high-quality information for controversial topics in various dental disciplines was GPT-4. Although the majority of chatbots performed well, it is suggested that developers of AI medical chatbots incorporate scientific citation authenticators to validate the outputted citations given the relatively high number of fabricated citations.
Collapse
Affiliation(s)
- Hossein Mohammad-Rahimi
- Division of Artificial Intelligence Research, University of Maryland School of Dentistry, Baltimore, MD, USA; Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany
| | - Zaid H Khoury
- Department of Oral Diagnostic Sciences and Research, Meharry Medical College School of Dentistry, Nashville, TN, USA
| | - Mina Iranparvar Alamdari
- Department of Oral and Maxillofacial Radiology, School of Dentistry, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Rata Rokhshad
- Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany
| | - Parisa Motie
- Medical Image and Signal Processing Research Center, Medical University of Isfahan, Isfahan, Iran
| | - Azin Parsa
- Department of Oncology and Diagnostic Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA
| | - Tiffany Tavares
- Department of Comprehensive Dentistry, UT Health San Antonio School of Dentistry, San Antonio, TX, USA
| | - James J Sciubba
- Department of Otolaryngology, Head & Neck Surgery, The Johns Hopkins University, Baltimore, MD, USA
| | - Jeffery B Price
- Division of Artificial Intelligence Research, University of Maryland School of Dentistry, Baltimore, MD, USA; Department of Oncology and Diagnostic Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA
| | - Ahmed S Sultan
- Division of Artificial Intelligence Research, University of Maryland School of Dentistry, Baltimore, MD, USA; Department of Oncology and Diagnostic Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA; University of Maryland Marlene and Stewart Greenebaum Comprehensive Cancer Center, Baltimore, MD, USA.
| |
Collapse
|
92
|
Jokar M, Abdous A, Rahmanian V. AI chatbots in pet health care: Opportunities and challenges for owners. Vet Med Sci 2024; 10:e1464. [PMID: 38678576 PMCID: PMC11056198 DOI: 10.1002/vms3.1464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 04/04/2024] [Indexed: 05/01/2024] Open
Abstract
The integration of artificial intelligence (AI) into health care has seen remarkable advancements, with applications extending to animal health. This article explores the potential benefits and challenges associated with employing AI chatbots as tools for pet health care. Focusing on ChatGPT, a prominent language model, the authors elucidate its capabilities and its potential impact on pet owners' decision-making processes. AI chatbots offer pet owners access to extensive information on animal health, research studies and diagnostic options, providing a cost-effective and convenient alternative to traditional veterinary consultations. The fate of a case involving a Border Collie named Sassy demonstrates the potential benefits of AI in veterinary medicine. In this instance, ChatGPT played a pivotal role in suggesting a diagnosis that led to successful treatment, showcasing the potential of AI chatbots as valuable tools in complex cases. However, concerns arise regarding pet owners relying solely on AI chatbots for medical advice, potentially resulting in misdiagnosis, inappropriate treatment and delayed professional intervention. We emphasize the need for a balanced approach, positioning AI chatbots as supplementary tools rather than substitutes for licensed veterinarians. To mitigate risks, the article proposes strategies such as educating pet owners on AI chatbots' limitations, implementing regulations to guide AI chatbot companies and fostering collaboration between AI chatbots and veterinarians. The intricate web of responsibilities in this dynamic landscape underscores the importance of government regulations, the educational role of AI chatbots and the symbiotic relationship between AI technology and veterinary expertise. In conclusion, while AI chatbots hold immense promise in transforming pet health care, cautious and informed usage is crucial. By promoting awareness, establishing regulations and fostering collaboration, the article advocates for a responsible integration of AI chatbots to ensure optimal care for pets.
Collapse
Affiliation(s)
- Mohammad Jokar
- Faculty of Veterinary MedicineKaraj BranchIslamic Azad UniversityKarajIran
| | - Arman Abdous
- Faculty of Veterinary MedicineKaraj BranchIslamic Azad UniversityKarajIran
| | - Vahid Rahmanian
- Department of Public HealthTorbat Jam Faculty of Medical SciencesTorbat JamIran
| |
Collapse
|
93
|
Huo B, Calabrese E, Sylla P, Kumar S, Ignacio RC, Oviedo R, Hassan I, Slater BJ, Kaiser A, Walsh DS, Vosburg W. The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease. Surg Endosc 2024; 38:2320-2330. [PMID: 38630178 DOI: 10.1007/s00464-024-10807-w] [citation(s)] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/21/2024] [Indexed: 08/16/2024]
Abstract
BACKGROUND Large language model (LLM)-linked chatbots may be an efficient source of clinical recommendations for healthcare providers and patients. This study evaluated the performance of LLM-linked chatbots in providing recommendations for the surgical management of gastroesophageal reflux disease (GERD). METHODS Nine patient cases were created based on key questions addressed by the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) guidelines for the surgical treatment of GERD. ChatGPT-3.5, ChatGPT-4, Copilot, Google Bard, and Perplexity AI were queried on November 16th, 2023, for recommendations regarding the surgical management of GERD. Accurate chatbot performance was defined as the number of responses aligning with SAGES guideline recommendations. Outcomes were reported with counts and percentages. RESULTS Surgeons were given accurate recommendations for the surgical management of GERD in an adult patient for 5/7 (71.4%) KQs by ChatGPT-4, 3/7 (42.9%) KQs by Copilot, 6/7 (85.7%) KQs by Google Bard, and 3/7 (42.9%) KQs by Perplexity according to the SAGES guidelines. Patients were given accurate recommendations for 3/5 (60.0%) KQs by ChatGPT-4, 2/5 (40.0%) KQs by Copilot, 4/5 (80.0%) KQs by Google Bard, and 1/5 (20.0%) KQs by Perplexity, respectively. In a pediatric patient, surgeons were given accurate recommendations for 2/3 (66.7%) KQs by ChatGPT-4, 3/3 (100.0%) KQs by Copilot, 3/3 (100.0%) KQs by Google Bard, and 2/3 (66.7%) KQs by Perplexity. Patients were given appropriate guidance for 2/2 (100.0%) KQs by ChatGPT-4, 2/2 (100.0%) KQs by Copilot, 1/2 (50.0%) KQs by Google Bard, and 1/2 (50.0%) KQs by Perplexity. CONCLUSIONS Gastrointestinal surgeons, gastroenterologists, and patients should recognize both the promise and pitfalls of LLM's when utilized for advice on surgical management of GERD. Additional training of LLM's using evidence-based health information is needed.
Collapse
Affiliation(s)
- Bright Huo
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada
| | - Elisa Calabrese
- University of California South California, East Bay, Oakland, CA, USA
| | - Patricia Sylla
- Division of Colon and Rectal Surgery, Department of Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sunjay Kumar
- Department of General Surgery, Thomas Jefferson University Hospital, Philadelphia, PA, USA
| | - Romeo C Ignacio
- Division of Pediatric Surgery/Department of Surgery, San Diego School of Medicine, University of California, California, CA, USA
| | - Rodolfo Oviedo
- Nacogdoches Center for Metabolic and Weight Loss Surgery, Nacogdoches, TX, USA
- University of Houston Tilman J. Fertitta Family College of Medicine, Houston, TX, USA
- Sam Houston State University College of Osteopathic Medicine, Conroe, TX, USA
| | | | | | - Andreas Kaiser
- Division of Colorectal Surgery, Department of Surgery, City of Hope National Medical Center, Duarte, CA, USA
| | - Danielle S Walsh
- Department of Surgery, University of Kentucky, Lexington, KY, USA
| | - Wesley Vosburg
- Department of Surgery, Harvard Medical School, Mount Auburn Hospital, Cambridge, MA, USA.
| |
Collapse
|
94
|
Huo B, Calabrese E, Sylla P, Kumar S, Ignacio RC, Oviedo R, Hassan I, Slater BJ, Kaiser A, Walsh DS, Vosburg W. The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease. Surg Endosc 2024; 38:2320-2330. [PMID: 38630178 DOI: 10.1007/s00464-024-10807-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/21/2024] [Indexed: 07/11/2024]
Abstract
BACKGROUND Large language model (LLM)-linked chatbots may be an efficient source of clinical recommendations for healthcare providers and patients. This study evaluated the performance of LLM-linked chatbots in providing recommendations for the surgical management of gastroesophageal reflux disease (GERD). METHODS Nine patient cases were created based on key questions addressed by the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) guidelines for the surgical treatment of GERD. ChatGPT-3.5, ChatGPT-4, Copilot, Google Bard, and Perplexity AI were queried on November 16th, 2023, for recommendations regarding the surgical management of GERD. Accurate chatbot performance was defined as the number of responses aligning with SAGES guideline recommendations. Outcomes were reported with counts and percentages. RESULTS Surgeons were given accurate recommendations for the surgical management of GERD in an adult patient for 5/7 (71.4%) KQs by ChatGPT-4, 3/7 (42.9%) KQs by Copilot, 6/7 (85.7%) KQs by Google Bard, and 3/7 (42.9%) KQs by Perplexity according to the SAGES guidelines. Patients were given accurate recommendations for 3/5 (60.0%) KQs by ChatGPT-4, 2/5 (40.0%) KQs by Copilot, 4/5 (80.0%) KQs by Google Bard, and 1/5 (20.0%) KQs by Perplexity, respectively. In a pediatric patient, surgeons were given accurate recommendations for 2/3 (66.7%) KQs by ChatGPT-4, 3/3 (100.0%) KQs by Copilot, 3/3 (100.0%) KQs by Google Bard, and 2/3 (66.7%) KQs by Perplexity. Patients were given appropriate guidance for 2/2 (100.0%) KQs by ChatGPT-4, 2/2 (100.0%) KQs by Copilot, 1/2 (50.0%) KQs by Google Bard, and 1/2 (50.0%) KQs by Perplexity. CONCLUSIONS Gastrointestinal surgeons, gastroenterologists, and patients should recognize both the promise and pitfalls of LLM's when utilized for advice on surgical management of GERD. Additional training of LLM's using evidence-based health information is needed.
Collapse
Affiliation(s)
- Bright Huo
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada
| | - Elisa Calabrese
- University of California South California, East Bay, Oakland, CA, USA
| | - Patricia Sylla
- Division of Colon and Rectal Surgery, Department of Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sunjay Kumar
- Department of General Surgery, Thomas Jefferson University Hospital, Philadelphia, PA, USA
| | - Romeo C Ignacio
- Division of Pediatric Surgery/Department of Surgery, San Diego School of Medicine, University of California, California, CA, USA
| | - Rodolfo Oviedo
- Nacogdoches Center for Metabolic and Weight Loss Surgery, Nacogdoches, TX, USA
- University of Houston Tilman J. Fertitta Family College of Medicine, Houston, TX, USA
- Sam Houston State University College of Osteopathic Medicine, Conroe, TX, USA
| | | | | | - Andreas Kaiser
- Division of Colorectal Surgery, Department of Surgery, City of Hope National Medical Center, Duarte, CA, USA
| | - Danielle S Walsh
- Department of Surgery, University of Kentucky, Lexington, KY, USA
| | - Wesley Vosburg
- Department of Surgery, Harvard Medical School, Mount Auburn Hospital, Cambridge, MA, USA.
| |
Collapse
|
95
|
Levin G, Pareja R, Viveros-Carreño D, Sanchez Diaz E, Yates EM, Zand B, Ramirez PT. Association of reviewer experience with discriminating human-written versus ChatGPT-written abstracts. Int J Gynecol Cancer 2024; 34:669-674. [PMID: 40228982 DOI: 10.1136/ijgc-2023-005162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/22/2024] [Indexed: 04/24/2024] Open
Abstract
OBJECTIVE To determine if reviewer experience impacts the ability to discriminate between human-written and ChatGPT-written abstracts. METHODS Thirty reviewers (10 seniors, 10 juniors, and 10 residents) were asked to differentiate between 10 ChatGPT-written and 10 human-written (fabricated) abstracts. For the study, 10 gynecologic oncology abstracts were fabricated by the authors. For each human-written abstract we generated a ChatGPT matching abstract by using the same title and the fabricated results of each of the human generated abstracts. A web-based questionnaire was used to gather demographic data and to record the reviewers' evaluation of the 20 abstracts. Comparative statistics and multivariable regression were used to identify factors associated with a higher correct identification rate. RESULTS The 30 reviewers discriminated 20 abstracts, giving a total of 600 abstract evaluations. The reviewers were able to correctly identify 300/600 (50%) of the abstracts: 139/300 (46.3%) of the ChatGPT-generated abstracts and 161/300 (53.7%) of the human-written abstracts (p=0.07). Human-written abstracts had a higher rate of correct identification (median (IQR) 56.7% (49.2-64.1%) vs 45.0% (43.2-48.3%), p=0.023). Senior reviewers had a higher correct identification rate (60%) than junior reviewers and residents (45% each; p=0.043 and p=0.002, respectively). In a linear regression model including the experience level of the reviewers, familiarity with artificial intelligence (AI) and the country in which the majority of medical training was achieved (English speaking vs non-English speaking), the experience of the reviewer (β=10.2 (95% CI 1.8 to 18.7)) and familiarity with AI (β=7.78 (95% CI 0.6 to 15.0)) were independently associated with the correct identification rate (p=0.019 and p=0.035, respectively). In a correlation analysis the number of publications by the reviewer was positively correlated with the correct identification rate (r28)=0.61, p<0.001. CONCLUSION A total of 46.3% of abstracts written by ChatGPT were detected by reviewers. The correct identification rate increased with reviewer and publication experience.
Collapse
Affiliation(s)
- Gabriel Levin
- Division of Gynecologic Oncology, Jewish General Hospital, McGill University, Montreal, Quebec, Canada.
| | - Rene Pareja
- Gynecologic Oncology, Clinica ASTORGA, Medellin, and Instituto Nacional de Cancerología, Bogotá, Colombia
| | - David Viveros-Carreño
- Unidad Ginecología Oncológica, Grupo de Investigación GIGA, Centro de Tratamiento e Investigación sobre Cáncer Luis Carlos Sarmiento Angulo - CTIC, Bogotá, Colombia; Department of Gynecologic Oncology, Clínica Universitaria Colombia, Bogotá, Colombia
| | - Emmanuel Sanchez Diaz
- Universidad Pontificia Bolivariana Clinica Universitaria Bolivariana, Medellin, Colombia
| | - Elise Mann Yates
- Obstetrics and Gynecology, Houston Methodist Hospital, Houston, Texas, USA
| | - Behrouz Zand
- Gynecologic Oncology, Houston Methodist, Shenandoah, Texas, USA
| | - Pedro T Ramirez
- Department of Obstetrics and Gynecology, Houston Methodist Hospital, Houston, Texas, USA
| |
Collapse
|
96
|
Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton EW, Malin BA, Yin Z. A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.26.24306390. [PMID: 38712148 PMCID: PMC11071576 DOI: 10.1101/2024.04.26.24306390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Background The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including healthcare. Numerous studies have since been conducted regarding how to employ state-of-the-art LLMs in health-related scenarios to assist patients, doctors, and public health administrators. Objective This review aims to summarize the applications and concerns of applying conversational LLMs in healthcare and provide an agenda for future research on LLMs in healthcare. Methods We utilized PubMed, ACM, and IEEE digital libraries as primary sources for this review. We followed the guidance of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA) to screen and select peer-reviewed research articles that (1) were related to both healthcare applications and conversational LLMs and (2) were published before September 1st, 2023, the date when we started paper collection and screening. We investigated these papers and classified them according to their applications and concerns. Results Our search initially identified 820 papers according to targeted keywords, out of which 65 papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT from OpenAI (60), followed by Bard from Google (1), Large Language Model Meta AI (LLaMA) from Meta (1), and other LLMs (5). These papers were classified into four categories in terms of their applications: 1) summarization, 2) medical knowledge inquiry, 3) prediction, and 4) administration, and four categories of concerns: 1) reliability, 2) bias, 3) privacy, and 4) public acceptability. There are 49 (75%) research papers using LLMs for summarization and/or medical knowledge inquiry, and 58 (89%) research papers expressing concerns about reliability and/or bias. We found that conversational LLMs exhibit promising results in summarization and providing medical knowledge to patients with a relatively high accuracy. However, conversational LLMs like ChatGPT are not able to provide reliable answers to complex health-related tasks that require specialized domain expertise. Additionally, no experiments in our reviewed papers have been conducted to thoughtfully examine how conversational LLMs lead to bias or privacy issues in healthcare research. Conclusions Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications brought bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in healthcare.
Collapse
Affiliation(s)
- Leyao Wang
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Zhiyu Wan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
| | - Congning Ni
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Qingyuan Song
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Yang Li
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Ellen Wright Clayton
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA, 37203
- Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, Tennessee, USA, 37203
| | - Bradley A. Malin
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
- Department of Biostatistics, Vanderbilt University Medical Center, TN, USA, 37203
| | - Zhijun Yin
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
| |
Collapse
|
97
|
Wu J, Ma Y, Wang J, Xiao M. The Application of ChatGPT in Medicine: A Scoping Review and Bibliometric Analysis. J Multidiscip Healthc 2024; 17:1681-1692. [PMID: 38650670 PMCID: PMC11034560 DOI: 10.2147/jmdh.s463128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
Purpose ChatGPT has a wide range of applications in the medical field. Therefore, this review aims to define the key issues and provide a comprehensive view of the literature based on the application of ChatGPT in medicine. Methods This scope follows Arksey and O'Malley's five-stage framework. A comprehensive literature search of publications (30 November 2022 to 16 August 2023) was conducted. Six databases were searched and relevant references were systematically catalogued. Attention was focused on the general characteristics of the articles, their fields of application, and the advantages and disadvantages of using ChatGPT. Descriptive statistics and narrative synthesis methods were used for data analysis. Results Of the 3426 studies, 247 met the criteria for inclusion in this review. The majority of articles (31.17%) were from the United States. Editorials (43.32%) ranked first, followed by experimental studys (11.74%). The potential applications of ChatGPT in medicine are varied, with the largest number of studies (45.75%) exploring clinical practice, including assisting with clinical decision support and providing disease information and medical advice. This was followed by medical education (27.13%) and scientific research (16.19%). Particularly noteworthy in the discipline statistics were radiology, surgery and dentistry at the top of the list. However, ChatGPT in medicine also faces issues of data privacy, inaccuracy and plagiarism. Conclusion The application of ChatGPT in medicine focuses on different disciplines and general application scenarios. ChatGPT has a paradoxical nature: it offers significant advantages, but at the same time raises great concerns about its application in healthcare settings. Therefore, it is imperative to develop theoretical frameworks that not only address its widespread use in healthcare but also facilitate a comprehensive assessment. In addition, these frameworks should contribute to the development of strict and effective guidelines and regulatory measures.
Collapse
Affiliation(s)
- Jie Wu
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Yingzhuo Ma
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Jun Wang
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Mingzhao Xiao
- Department of Urology, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| |
Collapse
|
98
|
Wimbarti S, Kairupan BHR, Tallei TE. Critical review of self-diagnosis of mental health conditions using artificial intelligence. Int J Ment Health Nurs 2024; 33:344-358. [PMID: 38345132 DOI: 10.1111/inm.13303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 01/26/2024] [Accepted: 01/30/2024] [Indexed: 03/10/2024]
Abstract
The advent of artificial intelligence (AI) has revolutionised various aspects of our lives, including mental health nursing. AI-driven tools and applications have provided a convenient and accessible means for individuals to assess their mental well-being within the confines of their homes. Nonetheless, the widespread trend of self-diagnosing mental health conditions through AI poses considerable risks. This review article examines the perils associated with relying on AI for self-diagnosis in mental health, highlighting the constraints and possible adverse outcomes that can arise from such practices. It delves into the ethical, psychological, and social implications, underscoring the vital role of mental health professionals, including psychologists, psychiatrists, and nursing specialists, in providing professional assistance and guidance. This article aims to highlight the importance of seeking professional assistance and guidance in addressing mental health concerns, especially in the era of AI-driven self-diagnosis.
Collapse
Affiliation(s)
- Supra Wimbarti
- Faculty of Psychology, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - B H Ralph Kairupan
- Department of Psychiatry, Faculty of Medicine, Sam Ratulangi University, Manado, North Sulawesi, Indonesia
| | - Trina Ekawati Tallei
- Department of Biology, Faculty of Mathematics and Natural Sciences, Sam Ratulangi University, Manado, North Sulawesi, Indonesia
- Department of Biology, Faculty of Medicine, Sam Ratulangi University, Manado, North Sulawesi, Indonesia
| |
Collapse
|
99
|
Ge J, Li M, Delk MB, Lai JC. A Comparison of a Large Language Model vs Manual Chart Review for the Extraction of Data Elements From the Electronic Health Record. Gastroenterology 2024; 166:707-709.e3. [PMID: 38151192 PMCID: PMC11792087 DOI: 10.1053/j.gastro.2023.12.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 12/10/2023] [Accepted: 12/18/2023] [Indexed: 12/29/2023]
Affiliation(s)
- Jin Ge
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California, San Francisco, San Francisco, California.
| | - Michael Li
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California, San Francisco, San Francisco, California
| | - Molly B Delk
- Section of Gastroenterology and Hepatology, Department of Medicine, Tulane University School of Medicine, New Orleans, Louisiana
| | - Jennifer C Lai
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California, San Francisco, San Francisco, California
| |
Collapse
|
100
|
Ahimaz P, Bergner AL, Florido ME, Harkavy N, Bhattacharyya S. Genetic counselors' utilization of ChatGPT in professional practice: A cross-sectional study. Am J Med Genet A 2024; 194:e63493. [PMID: 38066714 DOI: 10.1002/ajmg.a.63493] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 11/21/2023] [Accepted: 11/22/2023] [Indexed: 03/10/2024]
Abstract
PURPOSE The precision medicine era has seen increased utilization of artificial intelligence (AI) in the field of genetics. We sought to explore the ways that genetic counselors (GCs) currently use the publicly accessible AI tool Chat Generative Pre-trained Transformer (ChatGPT) in their work. METHODS GCs in North America were surveyed about how ChatGPT is used in different aspects of their work. Descriptive statistics were reported through frequencies and means. RESULTS Of 118 GCs who completed the survey, 33.8% (40) reported using ChatGPT in their work; 47.5% (19) use it in clinical practice, 35% (14) use it in education, and 32.5% (13) use it in research. Most GCs (62.7%; 74) felt that it saves time on administrative tasks but the majority (82.2%; 97) felt that a paramount challenge was the risk of obtaining incorrect information. The majority of GCs not using ChatGPT (58.9%; 46) felt it was not necessary for their work. CONCLUSION A considerable number of GCs in the field are using ChatGPT in different ways, but it is primarily helpful with tasks that involve writing. It has potential to streamline workflow issues encountered in clinical genetics, but practitioners need to be informed and uniformly trained about its limitations.
Collapse
Affiliation(s)
- Priyanka Ahimaz
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Amanda L Bergner
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Genetics and Development, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Neurology, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Michelle E Florido
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Genetics and Development, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Nina Harkavy
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Obstetrics and Gynecology, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Sriya Bhattacharyya
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Psychiatry, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| |
Collapse
|