Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Iannantuono GM, Bracken-Clarke D, Floudas CS, Roselli M, Gulley JL, Karzai F. Applications of large language models in cancer care: current evidence and future perspectives. Front Oncol 2023;13:1268915. [PMID: 37731643 PMCID: PMC10507617 DOI: 10.3389/fonc.2023.1268915] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 08/21/2023] [Indexed: 09/22/2023] Open

For:	Iannantuono GM, Bracken-Clarke D, Floudas CS, Roselli M, Gulley JL, Karzai F. Applications of large language models in cancer care: current evidence and future perspectives. Front Oncol 2023;13:1268915. [PMID: 37731643 PMCID: PMC10507617 DOI: 10.3389/fonc.2023.1268915] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 08/21/2023] [Indexed: 09/22/2023] Open

Number

Cited by Other Article(s)

Finch L, Broach V, Feinberg J, Al-Niaimi A, Abu-Rustum NR, Zhou Q, Iasonos A, Chi DS. ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study. Gynecol Oncol 2024;189:75-79. [PMID: 39042956 PMCID: PMC11402584 DOI: 10.1016/j.ygyno.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/08/2024] [Accepted: 07/15/2024] [Indexed: 07/25/2024]

Washington CJ, Abouyared M, Karanth S, Braithwaite D, Birkeland A, Silverman DA, Chen S. The Use of Chatbots in Head and Neck Mucosal Malignancy Treatment Recommendations. Otolaryngol Head Neck Surg 2024;171:1062-1068. [PMID: 38769872 DOI: 10.1002/ohn.818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 04/02/2024] [Accepted: 04/27/2024] [Indexed: 05/22/2024]

Abstract

OBJECTIVE

As cancer patients increasingly use chatbots, it is crucial to recognize ChatGPT's potential in enhancing health literacy while ensuring validation to prevent misinformation. This study aims to assess ChatGPT-3.5's capability to provide appropriate staging and treatment recommendations for head and neck mucosal malignancies for vulnerable populations.

STUDY DESIGN AND SETTING

Forty distinct clinical vignettes were introduced into ChatGPT to inquire about staging and treatment recommendations for head and neck mucosal malignancies.

METHODS

Prompts were created based on head and neck cancer (HNC) disease descriptions (cancer location, tumor size, lymph node involvement, and symptoms). Staging and treatment recommendations according to the 2021 National Comprehensive Cancer Network (NCCN) guidelines were scored by three fellowship-trained HNC surgeons from two separate tertiary care institutions. HNC surgeons assessed the accuracy of staging and treatment recommendations, such as the completeness of surgery and the appropriateness of treatment modality.

RESULTS

Whereas ChatGPT's responses were 95% accurate at recommending the correct first-line treatment based on the 2021 NCCN guidelines, 55% of the responses contained inaccurate staging. Neck dissection was incorrectly omitted from treatment recommendations in 50% of the cases. Moreover, 40% of ChatGPT's treatment recommendations were deemed unnecessary.

CONCLUSION

This study emphasizes ChatGPT's potential in HNC patient education, aligning with NCCN guidelines for mucosal malignancies, but highlights the importance of ongoing refinement and scrutiny due to observed inaccuracies in tumor, nodal, metastasis staging, incomplete surgery options, and inappropriate treatment recommendations. Otolaryngologists can use this information to caution patients, families, and trainees regarding the use of ChatGPT for HNC education without expert guidance.

Collapse

Marques de Mattos de Araujo B, Jesus Freitas PF, Deliga Schroder AG, Küchler EC, Baratto-Filho F, Ditzel Westphalen VP, Carneiro E, Xavier da Silva-Neto U, Miranda de Araujo C. PAINe - An Artificial Intelligence Based Virtual Assistant to Aid in the Differentiation of Pain of Odontogenic versus Temporomandibular Origin. J Endod 2024:S0099-2399(24)00524-7. [PMID: 39342988 DOI: 10.1016/j.joen.2024.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 09/23/2024] [Accepted: 09/23/2024] [Indexed: 10/01/2024]

Tong L, Zhang C, Liu R, Yang J, Sun Z. Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis. J Orthop Surg Res 2024;19:574. [PMID: 39289734 PMCID: PMC11409482 DOI: 10.1186/s13018-024-04996-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 08/12/2024] [Indexed: 09/19/2024] Open

Abstract

BACKGROUNDS

The use of large language models (LLMs) in medicine can help physicians improve the quality and effectiveness of health care by increasing the efficiency of medical information management, patient care, medical research, and clinical decision-making.

METHODS

We collected 34 frequently asked questions about glucocorticoid-induced osteoporosis (GIOP), covering topics related to the disease's clinical manifestations, pathogenesis, diagnosis, treatment, prevention, and risk factors. We also generated 25 questions based on the 2022 American College of Rheumatology Guideline for the Prevention and Treatment of Glucocorticoid-Induced Osteoporosis (2022 ACR-GIOP Guideline). Each question was posed to the LLM (ChatGPT-3.5, ChatGPT-4, and Google Gemini), and three senior orthopedic surgeons independently rated the responses generated by the LLMs. Three senior orthopedic surgeons independently rated the answers based on responses ranging between 1 and 4 points. A total score (TS) > 9 indicated 'good' responses, 6 ≤ TS ≤ 9 indicated 'moderate' responses, and TS < 6 indicated 'poor' responses.

RESULTS

In response to the general questions related to GIOP and the 2022 ACR-GIOP Guidelines, Google Gemini provided more concise answers than the other LLMs. In terms of pathogenesis, ChatGPT-4 had significantly higher total scores (TSs) than ChatGPT-3.5. The TSs for answering questions related to the 2022 ACR-GIOP Guideline by ChatGPT-4 were significantly higher than those for Google Gemini. ChatGPT-3.5 and ChatGPT-4 had significantly higher self-corrected TSs than pre-corrected TSs, while Google Gemini self-corrected for responses that were not significantly different than before.

CONCLUSIONS

Our study showed that Google Gemini provides more concise and intuitive responses than ChatGPT-3.5 and ChatGPT-4. ChatGPT-4 performed significantly better than ChatGPT3.5 and Google Gemini in terms of answering general questions about GIOP and the 2022 ACR-GIOP Guidelines. ChatGPT3.5 and ChatGPT-4 self-corrected better than Google Gemini.

Collapse

Ruiz Sarrias O, Martínez del Prado MP, Sala Gonzalez MÁ, Azcuna Sagarduy J, Casado Cuesta P, Figaredo Berjano C, Galve-Calvo E, López de San Vicente Hernández B, López-Santillán M, Nuño Escolástico M, Sánchez Togneri L, Sande Sardina L, Pérez Hoyos MT, Abad Villar MT, Zabalza Zudaire M, Sayar Beristain O. Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions. Cancers (Basel) 2024;16:2830. [PMID: 39199603 PMCID: PMC11352281 DOI: 10.3390/cancers16162830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 08/04/2024] [Accepted: 08/09/2024] [Indexed: 09/01/2024] Open

Abstract

INTRODUCTION

Large Language Models (LLMs), such as the GPT model family from OpenAI, have demonstrated transformative potential across various fields, especially in medicine. These models can understand and generate contextual text, adapting to new tasks without specific training. This versatility can revolutionize clinical practices by enhancing documentation, patient interaction, and decision-making processes. In oncology, LLMs offer the potential to significantly improve patient care through the continuous monitoring of chemotherapy-induced toxicities, which is a task that is often unmanageable for human resources alone. However, existing research has not sufficiently explored the accuracy of LLMs in identifying and assessing subjective toxicities based on patient descriptions. This study aims to fill this gap by evaluating the ability of LLMs to accurately classify these toxicities, facilitating personalized and continuous patient care.

METHODS

This comparative pilot study assessed the ability of an LLM to classify subjective toxicities from chemotherapy. Thirteen oncologists evaluated 30 fictitious cases created using expert knowledge and OpenAI's GPT-4. These evaluations, based on the CTCAE v.5 criteria, were compared to those of a contextualized LLM model. Metrics such as mode and mean of responses were used to gauge consensus. The accuracy of the LLM was analyzed in both general and specific toxicity categories, considering types of errors and false alarms. The study's results are intended to justify further research involving real patients.

RESULTS

The study revealed significant variability in oncologists' evaluations due to the lack of interaction with fictitious patients. The LLM model achieved an accuracy of 85.7% in general categories and 64.6% in specific categories using mean evaluations with mild errors at 96.4% and severe errors at 3.6%. False alarms occurred in 3% of cases. When comparing the LLM's performance to that of expert oncologists, individual accuracy ranged from 66.7% to 89.2% for general categories and 57.0% to 76.0% for specific categories. The 95% confidence intervals for the median accuracy of oncologists were 81.9% to 86.9% for general categories and 67.6% to 75.6% for specific categories. These benchmarks highlight the LLM's potential to achieve expert-level performance in classifying chemotherapy-induced toxicities.

DISCUSSION

The findings demonstrate that LLMs can classify subjective toxicities from chemotherapy with accuracy comparable to expert oncologists. The LLM achieved 85.7% accuracy in general categories and 64.6% in specific categories. While the model's general category performance falls within expert ranges, specific category accuracy requires improvement. The study's limitations include the use of fictitious cases, lack of patient interaction, and reliance on audio transcriptions. Nevertheless, LLMs show significant potential for enhancing patient monitoring and reducing oncologists' workload. Future research should focus on the specific training of LLMs for medical tasks, conducting studies with real patients, implementing interactive evaluations, expanding sample sizes, and ensuring robustness and generalization in diverse clinical settings.

CONCLUSIONS

This study concludes that LLMs can classify subjective toxicities from chemotherapy with accuracy comparable to expert oncologists. The LLM's performance in general toxicity categories is within the expert range, but there is room for improvement in specific categories. LLMs have the potential to enhance patient monitoring, enable early interventions, and reduce severe complications, improving care quality and efficiency. Future research should involve specific training of LLMs, validation with real patients, and the incorporation of interactive capabilities for real-time patient interactions. Ethical considerations, including data accuracy, transparency, and privacy, are crucial for the safe integration of LLMs into clinical practice.

Collapse

Affiliation(s)

Oskitz Ruiz Sarrias Department of Mathematics and Statistic, NNBi 2020 SL, 31110 Noain, Navarra, Spain;
María Purificación Martínez del Prado Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
María Ángeles Sala Gonzalez Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
Josune Azcuna Sagarduy Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
Pablo Casado Cuesta Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
Covadonga Figaredo Berjano Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
Elena Galve-Calvo Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
Borja López de San Vicente Hernández Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
María López-Santillán Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
Maitane Nuño Escolástico Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
Laura Sánchez Togneri Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
Laura Sande Sardina Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
María Teresa Pérez Hoyos Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
María Teresa Abad Villar Medical Oncology Service, Basurto University Hospital, OSI Bilbao-Basurto, Osakidetza, 48013 Bilbao, Biscay, Spain; (M.P.M.d.P.); (M.Á.S.G.); (J.A.S.); (P.C.C.); (C.F.B.); (E.G.-C.); (B.L.d.S.V.H.); (M.L.-S.); (M.N.E.); (L.S.T.); (L.S.S.); (M.T.P.H.); (M.T.A.V.)
Maialen Zabalza Zudaire Department of Mathematics and Statistic, NNBi 2020 SL, 31110 Noain, Navarra, Spain;
Onintza Sayar Beristain Department of Mathematics and Statistic, NNBi 2020 SL, 31110 Noain, Navarra, Spain;

Collapse

Geantă M, Bădescu D, Chirca N, Nechita OC, Radu CG, Rascu S, Rădăvoi D, Sima C, Toma C, Jinga V. The Potential Impact of Large Language Models on Doctor-Patient Communication: A Case Study in Prostate Cancer. Healthcare (Basel) 2024;12:1548. [PMID: 39120251 PMCID: PMC11311818 DOI: 10.3390/healthcare12151548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/16/2024] [Accepted: 08/03/2024] [Indexed: 08/10/2024] Open

Affiliation(s)

Marius Geantă Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania Center for Innovation in Medicine, 42J Theodor Pallady Bvd., 032266 Bucharest, Romania United Nations University—Maastricht Economic and Social Research Institute on Innovation and Technology, Boschstraat 24, 6211 AX Maastricht, The Netherlands
Daniel Bădescu Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Narcis Chirca Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Ovidiu Cătălin Nechita Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Cosmin George Radu Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Stefan Rascu Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Daniel Rădăvoi Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Cristian Sima Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Cristian Toma Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Viorel Jinga Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania Academy of Romanian Scientists, 3 Ilfov, 050085 Bucharest, Romania

Collapse

Pavlovic ZJ, Jiang VS, Hariton E. Current applications of artificial intelligence in assisted reproductive technologies through the perspective of a patient's journey. Curr Opin Obstet Gynecol 2024;36:211-217. [PMID: 38597425 DOI: 10.1097/gco.0000000000000951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]

Kneifel F, Becker F, Knipping A, Katou S, Andreou A, Juratli M, Houben P, Morgul H, Pascher A, Strücker B. ChatGPT as a Source of Information on Pancreatic Cancer. DEUTSCHES ARZTEBLATT INTERNATIONAL 2024;121:505-506. [PMID: 39356560 DOI: 10.3238/arztebl.m2024.0081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 04/17/2024] [Accepted: 04/17/2024] [Indexed: 10/04/2024]

Ray PP. Letter to the editor regarding "Application of the convolution neural network in determining the depth of invasion of gastrointestinal cancer: a systematic review and meta-analysis". J Gastrointest Surg 2024;28:1218-1219. [PMID: 38703989 DOI: 10.1016/j.gassur.2024.04.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 04/27/2024] [Indexed: 05/06/2024]

Geantă M, Bădescu D, Chirca N, Nechita OC, Radu CG, Rascu Ș, Rădăvoi D, Sima C, Toma C, Jinga V. The Emerging Role of Large Language Models in Improving Prostate Cancer Literacy. Bioengineering (Basel) 2024;11:654. [PMID: 39061736 PMCID: PMC11274300 DOI: 10.3390/bioengineering11070654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 06/18/2024] [Accepted: 06/24/2024] [Indexed: 07/28/2024] Open

Affiliation(s)

Marius Geantă Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania (V.J.) Center for Innovation in Medicine, 42J Theodor Pallady Blvd., 032266 Bucharest, Romania
Daniel Bădescu Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania (V.J.) Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Narcis Chirca Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania (V.J.) Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Ovidiu Cătălin Nechita Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania (V.J.) Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Cosmin George Radu Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Ștefan Rascu Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania (V.J.) Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Daniel Rădăvoi Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania (V.J.) Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Cristian Sima Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania (V.J.) Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Cristian Toma Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania (V.J.) Department of Urology, “Prof. Dr. Th. Burghele” Clinical Hospital, 20 Panduri Str., 050659 Bucharest, Romania
Viorel Jinga Department of Urology, “Carol Davila” University of Medicine and Pharmacy, 8 Eroii Sanitari Blvd., 050474 Bucharest, Romania (V.J.) Academy of Romanian Scientists, 3 Ilfov, 050085 Bucharest, Romania

Collapse

Borna S, Gomez-Cabello CA, Pressman SM, Haider SA, Forte AJ. Comparative Analysis of Large Language Models in Emergency Plastic Surgery Decision-Making: The Role of Physical Exam Data. J Pers Med 2024;14:612. [PMID: 38929832 PMCID: PMC11204584 DOI: 10.3390/jpm14060612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 06/04/2024] [Accepted: 06/06/2024] [Indexed: 06/28/2024] Open

Longwell JB, Hirsch I, Binder F, Gonzalez Conchas GA, Mau D, Jang R, Krishnan RG, Grant RC. Performance of Large Language Models on Medical Oncology Examination Questions. JAMA Netw Open 2024;7:e2417641. [PMID: 38888919 PMCID: PMC11185976 DOI: 10.1001/jamanetworkopen.2024.17641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 04/18/2024] [Indexed: 06/20/2024] Open

Abstract

Importance

Large language models (LLMs) recently developed an unprecedented ability to answer questions. Studies of LLMs from other fields may not generalize to medical oncology, a high-stakes clinical setting requiring rapid integration of new information.

Objective

To evaluate the accuracy and safety of LLM answers on medical oncology examination questions.

Design, Setting, and Participants

This cross-sectional study was conducted between May 28 and October 11, 2023. The American Society of Clinical Oncology (ASCO) Oncology Self-Assessment Series on ASCO Connection, the European Society of Medical Oncology (ESMO) Examination Trial questions, and an original set of board-style medical oncology multiple-choice questions were presented to 8 LLMs.

Main Outcomes and Measures

The primary outcome was the percentage of correct answers. Medical oncologists evaluated the explanations provided by the best LLM for accuracy, classified the types of errors, and estimated the likelihood and extent of potential clinical harm.

Results

Proprietary LLM 2 correctly answered 125 of 147 questions (85.0%; 95% CI, 78.2%-90.4%; P < .001 vs random answering). Proprietary LLM 2 outperformed an earlier version, proprietary LLM 1, which correctly answered 89 of 147 questions (60.5%; 95% CI, 52.2%-68.5%; P < .001), and the best open-source LLM, Mixtral-8x7B-v0.1, which correctly answered 87 of 147 questions (59.2%; 95% CI, 50.0%-66.4%; P < .001). The explanations provided by proprietary LLM 2 contained no or minor errors for 138 of 147 questions (93.9%; 95% CI, 88.7%-97.2%). Incorrect responses were most commonly associated with errors in information retrieval, particularly with recent publications, followed by erroneous reasoning and reading comprehension. If acted upon in clinical practice, 18 of 22 incorrect answers (81.8%; 95% CI, 59.7%-94.8%) would have a medium or high likelihood of moderate to severe harm.

Conclusions and Relevance

In this cross-sectional study of the performance of LLMs on medical oncology examination questions, the best LLM answered questions with remarkable performance, although errors raised safety concerns. These results demonstrated an opportunity to develop and evaluate LLMs to improve health care clinician experiences and patient care, considering the potential impact on capabilities and safety.

Collapse

Bajčetić M, Mirčić A, Rakočević J, Đoković D, Milutinović K, Zaletel I. Comparing the performance of artificial intelligence learning models to medical students in solving histology and embryology multiple choice questions. Ann Anat 2024;254:152261. [PMID: 38521363 DOI: 10.1016/j.aanat.2024.152261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/06/2024] [Accepted: 03/19/2024] [Indexed: 03/25/2024]

Abstract

INTRODUCTION

The appearance of artificial intelligence language models (AI LMs) in the form of chatbots has gained a lot of popularity worldwide, potentially interfering with different aspects of education, including medical education as well. The present study aims to assess the accuracy and consistency of different AI LMs regarding the histology and embryology knowledge obtained during the 1st year of medical studies.

METHODS

Five different chatbots (ChatGPT, Bing AI, Bard AI, Perplexity AI, and ChatSonic) were given two sets of multiple-choice questions (MCQs). AI LMs test results were compared to the same test results obtained from 1st year medical students. Chatbots were instructed to use revised Bloom's taxonomy when classifying questions depending on hierarchical cognitive domains. Simultaneously, two histology teachers independently rated the questions applying the same criteria, followed by the comparison between chatbots' and teachers' question classification. The consistency of chatbots' answers was explored by giving the chatbots the same tests two months apart.

RESULTS

AI LMs successfully and correctly solved MCQs regarding histology and embryology material. All five chatbots showed better results than the 1st year medical students on both histology and embryology tests. Chatbots showed poor results when asked to classify the questions according to revised Bloom's cognitive taxonomy compared to teachers. There was an inverse correlation between the difficulty of questions and their correct classification by the chatbots. Retesting the chatbots after two months showed a lack of consistency concerning both MCQs answers and question classification according to revised Bloom's taxonomy learning stage.

CONCLUSION

Despite the ability of certain chatbots to provide correct answers to the majority of diverse and heterogeneous questions, a lack of consistency in answers over time warrants their careful use as a medical education tool.

Collapse

Xue E, Bracken-Clarke D, Iannantuono GM, Choo-Wosoba H, Gulley JL, Floudas CS. Utility of Large Language Models for Health Care Professionals and Patients in Navigating Hematopoietic Stem Cell Transplantation: Comparison of the Performance of ChatGPT-3.5, ChatGPT-4, and Bard. J Med Internet Res 2024;26:e54758. [PMID: 38758582 PMCID: PMC11143389 DOI: 10.2196/54758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 03/22/2024] [Accepted: 03/22/2024] [Indexed: 05/18/2024] Open

Abstract

BACKGROUND

Artificial intelligence is increasingly being applied to many workflows. Large language models (LLMs) are publicly accessible platforms trained to understand, interact with, and produce human-readable text; their ability to deliver relevant and reliable information is also of particular interest for the health care providers and the patients. Hematopoietic stem cell transplantation (HSCT) is a complex medical field requiring extensive knowledge, background, and training to practice successfully and can be challenging for the nonspecialist audience to comprehend.

OBJECTIVE

We aimed to test the applicability of 3 prominent LLMs, namely ChatGPT-3.5 (OpenAI), ChatGPT-4 (OpenAI), and Bard (Google AI), in guiding nonspecialist health care professionals and advising patients seeking information regarding HSCT.

METHODS

We submitted 72 open-ended HSCT-related questions of variable difficulty to the LLMs and rated their responses based on consistency-defined as replicability of the response-response veracity, language comprehensibility, specificity to the topic, and the presence of hallucinations. We then rechallenged the 2 best performing chatbots by resubmitting the most difficult questions and prompting to respond as if communicating with either a health care professional or a patient and to provide verifiable sources of information. Responses were then rerated with the additional criterion of language appropriateness, defined as language adaptation for the intended audience.

RESULTS

ChatGPT-4 outperformed both ChatGPT-3.5 and Bard in terms of response consistency (66/72, 92%; 54/72, 75%; and 63/69, 91%, respectively; P=.007), response veracity (58/66, 88%; 40/54, 74%; and 16/63, 25%, respectively; P<.001), and specificity to the topic (60/66, 91%; 43/54, 80%; and 27/63, 43%, respectively; P<.001). Both ChatGPT-4 and ChatGPT-3.5 outperformed Bard in terms of language comprehensibility (64/66, 97%; 53/54, 98%; and 52/63, 83%, respectively; P=.002). All displayed episodes of hallucinations. ChatGPT-3.5 and ChatGPT-4 were then rechallenged with a prompt to adapt their language to the audience and to provide source of information, and responses were rated. ChatGPT-3.5 showed better ability to adapt its language to nonmedical audience than ChatGPT-4 (17/21, 81% and 10/22, 46%, respectively; P=.03); however, both failed to consistently provide correct and up-to-date information resources, reporting either out-of-date materials, incorrect URLs, or unfocused references, making their output not verifiable by the reader.

CONCLUSIONS

In conclusion, despite LLMs' potential capability in confronting challenging medical topics such as HSCT, the presence of mistakes and lack of clear references make them not yet appropriate for routine, unsupervised clinical use, or patient counseling. Implementation of LLMs' ability to access and to reference current and updated websites and research papers, as well as development of LLMs trained in specialized domain knowledge data sets, may offer potential solutions for their future clinical application.

Collapse

Borna S, Gomez-Cabello CA, Pressman SM, Haider SA, Sehgal A, Leibovich BC, Cole D, Forte AJ. Comparative Analysis of Artificial Intelligence Virtual Assistant and Large Language Models in Post-Operative Care. Eur J Investig Health Psychol Educ 2024;14:1413-1424. [PMID: 38785591 PMCID: PMC11119735 DOI: 10.3390/ejihpe14050093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/11/2024] [Accepted: 05/14/2024] [Indexed: 05/25/2024] Open

Saner FH, Saner YM, Abufarhaneh E, Broering DC, Raptis DA. Comparative Analysis of Artificial Intelligence (AI) Languages in Predicting Sequential Organ Failure Assessment (SOFA) Scores. Cureus 2024;16:e59662. [PMID: 38836141 PMCID: PMC11148682 DOI: 10.7759/cureus.59662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/04/2024] [Indexed: 06/06/2024] Open

He W, Zhang W, Jin Y, Zhou Q, Zhang H, Xia Q. Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis. J Med Internet Res 2024;26:e54706. [PMID: 38687566 PMCID: PMC11094593 DOI: 10.2196/54706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 03/20/2024] [Accepted: 04/02/2024] [Indexed: 05/02/2024] Open

Abstract

BACKGROUND

There is a dearth of feasibility assessments regarding using large language models (LLMs) for responding to inquiries from autistic patients within a Chinese-language context. Despite Chinese being one of the most widely spoken languages globally, the predominant research focus on applying these models in the medical field has been on English-speaking populations.

OBJECTIVE

This study aims to assess the effectiveness of LLM chatbots, specifically ChatGPT-4 (OpenAI) and ERNIE Bot (version 2.2.3; Baidu, Inc), one of the most advanced LLMs in China, in addressing inquiries from autistic individuals in a Chinese setting.

METHODS

For this study, we gathered data from DXY-a widely acknowledged, web-based, medical consultation platform in China with a user base of over 100 million individuals. A total of 100 patient consultation samples were rigorously selected from January 2018 to August 2023, amounting to 239 questions extracted from publicly available autism-related documents on the platform. To maintain objectivity, both the original questions and responses were anonymized and randomized. An evaluation team of 3 chief physicians assessed the responses across 4 dimensions: relevance, accuracy, usefulness, and empathy. The team completed 717 evaluations. The team initially identified the best response and then used a Likert scale with 5 response categories to gauge the responses, each representing a distinct level of quality. Finally, we compared the responses collected from different sources.

RESULTS

Among the 717 evaluations conducted, 46.86% (95% CI 43.21%-50.51%) of assessors displayed varying preferences for responses from physicians, with 34.87% (95% CI 31.38%-38.36%) of assessors favoring ChatGPT and 18.27% (95% CI 15.44%-21.10%) of assessors favoring ERNIE Bot. The average relevance scores for physicians, ChatGPT, and ERNIE Bot were 3.75 (95% CI 3.69-3.82), 3.69 (95% CI 3.63-3.74), and 3.41 (95% CI 3.35-3.46), respectively. Physicians (3.66, 95% CI 3.60-3.73) and ChatGPT (3.73, 95% CI 3.69-3.77) demonstrated higher accuracy ratings compared to ERNIE Bot (3.52, 95% CI 3.47-3.57). In terms of usefulness scores, physicians (3.54, 95% CI 3.47-3.62) received higher ratings than ChatGPT (3.40, 95% CI 3.34-3.47) and ERNIE Bot (3.05, 95% CI 2.99-3.12). Finally, concerning the empathy dimension, ChatGPT (3.64, 95% CI 3.57-3.71) outperformed physicians (3.13, 95% CI 3.04-3.21) and ERNIE Bot (3.11, 95% CI 3.04-3.18).

CONCLUSIONS

In this cross-sectional study, physicians' responses exhibited superiority in the present Chinese-language context. Nonetheless, LLMs can provide valuable medical guidance to autistic patients and may even surpass physicians in demonstrating empathy. However, it is crucial to acknowledge that further optimization and research are imperative prerequisites before the effective integration of LLMs in clinical settings across diverse linguistic environments can be realized.

TRIAL REGISTRATION

Chinese Clinical Trial Registry ChiCTR2300074655; https://www.chictr.org.cn/bin/project/edit?pid=199432.

Collapse

Pressman SM, Borna S, Gomez-Cabello CA, Haider SA, Haider C, Forte AJ. AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research. Healthcare (Basel) 2024;12:825. [PMID: 38667587 PMCID: PMC11050155 DOI: 10.3390/healthcare12080825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/02/2024] [Accepted: 04/09/2024] [Indexed: 04/28/2024] Open

Alanezi F. Examining the role of ChatGPT in promoting health behaviors and lifestyle changes among cancer patients. Nutr Health 2024:2601060241244563. [PMID: 38567408 DOI: 10.1177/02601060241244563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]

Freire Y, Santamaría Laorden A, Orejas Pérez J, Gómez Sánchez M, Díaz-Flores García V, Suárez A. ChatGPT performance in prosthodontics: Assessment of accuracy and repeatability in answer generation. J Prosthet Dent 2024;131:659.e1-659.e6. [PMID: 38310063 DOI: 10.1016/j.prosdent.2024.01.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/17/2024] [Accepted: 01/18/2024] [Indexed: 02/05/2024]

Caglayan A, Slusarczyk W, Rabbani RD, Ghose A, Papadopoulos V, Boussios S. Large Language Models in Oncology: Revolution or Cause for Concern? Curr Oncol 2024;31:1817-1830. [PMID: 38668040 PMCID: PMC11049602 DOI: 10.3390/curroncol31040137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/13/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024] Open

Ocakoglu SR, Coskun B. The Emerging Role of AI in Patient Education: A Comparative Analysis of LLM Accuracy for Pelvic Organ Prolapse. Med Princ Pract 2024;33:000538538. [PMID: 38527444 PMCID: PMC11324208 DOI: 10.1159/000538538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 03/21/2024] [Indexed: 03/27/2024] Open

Weidener L, Fischer M. Artificial Intelligence in Medicine: Cross-Sectional Study Among Medical Students on Application, Education, and Ethical Aspects. JMIR MEDICAL EDUCATION 2024;10:e51247. [PMID: 38180787 PMCID: PMC10799276 DOI: 10.2196/51247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/26/2023] [Accepted: 12/02/2023] [Indexed: 01/06/2024]

Abstract

BACKGROUND

The use of artificial intelligence (AI) in medicine not only directly impacts the medical profession but is also increasingly associated with various potential ethical aspects. In addition, the expanding use of AI and AI-based applications such as ChatGPT demands a corresponding shift in medical education to adequately prepare future practitioners for the effective use of these tools and address the associated ethical challenges they present.

OBJECTIVE

This study aims to explore how medical students from Germany, Austria, and Switzerland perceive the use of AI in medicine and the teaching of AI and AI ethics in medical education in accordance with their use of AI-based chat applications, such as ChatGPT.

METHODS

This cross-sectional study, conducted from June 15 to July 15, 2023, surveyed medical students across Germany, Austria, and Switzerland using a web-based survey. This study aimed to assess students' perceptions of AI in medicine and the integration of AI and AI ethics into medical education. The survey, which included 53 items across 6 sections, was developed and pretested. Data analysis used descriptive statistics (median, mode, IQR, total number, and percentages) and either the chi-square or Mann-Whitney U tests, as appropriate.

RESULTS

Surveying 487 medical students across Germany, Austria, and Switzerland revealed limited formal education on AI or AI ethics within medical curricula, although 38.8% (189/487) had prior experience with AI-based chat applications, such as ChatGPT. Despite varied prior exposures, 71.7% (349/487) anticipated a positive impact of AI on medicine. There was widespread consensus (385/487, 74.9%) on the need for AI and AI ethics instruction in medical education, although the current offerings were deemed inadequate. Regarding the AI ethics education content, all proposed topics were rated as highly relevant.

CONCLUSIONS

This study revealed a pronounced discrepancy between the use of AI-based (chat) applications, such as ChatGPT, among medical students in Germany, Austria, and Switzerland and the teaching of AI in medical education. To adequately prepare future medical professionals, there is an urgent need to integrate the teaching of AI and AI ethics into the medical curricula.

Collapse

Piao Y, Chen H, Wu S, Li X, Li Z, Yang D. Assessing the performance of large language models (LLMs) in answering medical questions regarding breast cancer in the Chinese context. Digit Health 2024;10:20552076241284771. [PMID: 39386109 PMCID: PMC11462564 DOI: 10.1177/20552076241284771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 09/03/2024] [Indexed: 10/12/2024] Open

Abstract

Purpose

Large language models (LLMs) are deep learning models designed to comprehend and generate meaningful responses, which have gained public attention in recent years. The purpose of this study is to evaluate and compare the performance of LLMs in answering questions regarding breast cancer in the Chinese context.

Material and Methods

ChatGPT, ERNIE Bot, and ChatGLM were chosen to answer 60 questions related to breast cancer posed by two oncologists. Responses were scored as comprehensive, correct but inadequate, mixed with correct and incorrect data, completely incorrect, or unanswered. The accuracy, length, and readability among answers from different models were evaluated using statistical software.

Results

ChatGPT answered 60 questions, with 40 (66.7%) comprehensive answers and six (10.0%) correct but inadequate answers. ERNIE Bot answered 60 questions, with 34 (56.7%) comprehensive answers and seven (11.7%) correct but inadequate answers. ChatGLM generated 60 answers, with 35 (58.3%) comprehensive answers and six (10.0%) correct but inadequate answers. The differences for chosen accuracy metrics among the three LLMs did not reach statistical significance, but only ChatGPT demonstrated a sense of human compassion. The accuracy of the three models in answering questions regarding breast cancer treatment was the lowest, with an average of 44.4%. ERNIE Bot's responses were significantly shorter compared to ChatGPT and ChatGLM (p < .001 for both). The readability scores of the three models showed no statistical significance.

Conclusions

In the Chinese context, the capabilities of ChatGPT, ERNIE Bot, and ChatGLM are similar in answering breast cancer-related questions at present. These three LLMs may serve as adjunct informational tools for breast cancer patients in the Chinese context, offering guidance for general inquiries. However, for highly specialized issues, particularly in the realm of breast cancer treatment, LLMs cannot deliver reliable performance. It is necessary to utilize them under the supervision of healthcare professionals.

Collapse

Iannantuono GM, Bracken-Clarke D, Karzai F, Choo-Wosoba H, Gulley JL, Floudas CS. Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.31.23297825. [PMID: 38076813 PMCID: PMC10705618 DOI: 10.1101/2023.10.31.23297825] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]