1
|
Gumilar KE, Indraprasta BR, Hsu YC, Yu ZY, Chen H, Irawan B, Tambunan Z, Wibowo BM, Nugroho H, Tjokroprawiro BA, Dachlan EG, Mulawardhana P, Rahestyningtyas E, Pramuditya H, Putra VGE, Waluyo ST, Tan NR, Folarin R, Ibrahim IH, Lin CH, Hung TY, Lu TF, Chen YF, Shih YH, Wang SJ, Huang J, Yates CC, Lu CH, Liao LN, Tan M. Disparities in medical recommendations from AI-based chatbots across different countries/regions. Sci Rep 2024; 14:17052. [PMID: 39048640 PMCID: PMC11269683 DOI: 10.1038/s41598-024-67689-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 07/15/2024] [Indexed: 07/27/2024] Open
Abstract
This study explores disparities and opportunities in healthcare information provided by AI chatbots. We focused on recommendations for adjuvant therapy in endometrial cancer, analyzing responses across four regions (Indonesia, Nigeria, Taiwan, USA) and three platforms (Bard, Bing, ChatGPT-3.5). Utilizing previously published cases, we asked identical questions to chatbots from each location within a 24-h window. Responses were evaluated in a double-blinded manner on relevance, clarity, depth, focus, and coherence by ten experts in endometrial cancer. Our analysis revealed significant variations across different countries/regions (p < 0.001). Interestingly, Bing's responses in Nigeria consistently outperformed others (p < 0.05), excelling in all evaluation criteria (p < 0.001). Bard also performed better in Nigeria compared to other regions (p < 0.05), consistently surpassing them across all categories (p < 0.001, with relevance reaching p < 0.01). Notably, Bard's overall scores were significantly higher than those of ChatGPT-3.5 and Bing in all locations (p < 0.001). These findings highlight disparities and opportunities in the quality of AI-powered healthcare information based on user location and platform. This emphasizes the necessity for more research and development to guarantee equal access to trustworthy medical information through AI technologies.
Collapse
Affiliation(s)
- Khanisyah E Gumilar
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan.
- Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga-Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Kec. Mulyorejo, Surabaya, Jawa Timur, 60115, Indonesia.
| | - Birama R Indraprasta
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Yu-Cheng Hsu
- Department of Public Health, China Medical University, No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC
- School of Chinese Medicine, China Medical University, Taichung, Taiwan
| | - Zih-Ying Yu
- Department of Public Health, China Medical University, No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC
| | - Hong Chen
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
| | - Budi Irawan
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Zulkarnain Tambunan
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Bagus M Wibowo
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Hari Nugroho
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Brahmana A Tjokroprawiro
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Erry G Dachlan
- Department of Obstetrics and Gynecology, Dr. Soetomo General Hospital-Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
| | - Pungky Mulawardhana
- Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga-Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Kec. Mulyorejo, Surabaya, Jawa Timur, 60115, Indonesia
| | - Eccita Rahestyningtyas
- Department of Obstetrics and Gynecology, Hospital of Universitas Airlangga-Faculty of Medicine, Universitas Airlangga, Jl. Dharmahusada Permai, Mulyorejo, Kec. Mulyorejo, Surabaya, Jawa Timur, 60115, Indonesia
| | - Herlangga Pramuditya
- Department of Obstetrics and Gynecology, Dr. Ramelan Naval Hospital, Surabaya, Indonesia
| | - Very Great E Putra
- Department of Obstetrics and Gynecology, Dr. Kariadi Central General Hospital, Semarang, Indonesia
| | - Setyo T Waluyo
- Department of Obstetrics and Gynecology, Ulin General Hospital, Banjarmasin, Indonesia
| | - Nathan R Tan
- Department of Modern and Classical Languages and Literature, University of South Alabama, Mobile, AL, USA
| | - Royhaan Folarin
- Department of Anatomy, Faculty of Basic Medical Sciences, Olabisi Onabanjo University, Sagamu, Nigeria
| | - Ibrahim H Ibrahim
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
| | - Cheng-Han Lin
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
| | - Tai-Yu Hung
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan
| | - Ting-Fang Lu
- Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
| | - Yen-Fu Chen
- Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
| | - Yu-Hsiang Shih
- Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
| | - Shao-Jing Wang
- Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC
| | - Jingshan Huang
- School of Computing and College of Medicine, University of South Alabama, Mobile, AL, USA
| | - Clayton C Yates
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Chien-Hsing Lu
- Department of Obstetrics and Gynecology, Taichung Veteran General Hospital, 1650 Taiwan Boulevard Sector. 4, Taichung, 40705, Taiwan, ROC.
| | - Li-Na Liao
- Department of Public Health, China Medical University, No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC.
| | - Ming Tan
- Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan.
- Institute of Biochemistry and Molecular Biology, Graduate Institute of Biomedical Sciences, China Medical University (Taiwan), No. 100, Sec. 1, Jingmao Rd, Beitun Dist, Taichung, 406040, Taiwan, ROC.
| |
Collapse
|
2
|
Naz R, Akacı O, Erdoğan H, Açıkgöz A. Can large language models provide accurate and quality information to parents regarding chronic kidney diseases? J Eval Clin Pract 2024. [PMID: 38959373 DOI: 10.1111/jep.14084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 06/24/2024] [Indexed: 07/05/2024]
Abstract
RATIONALE Artificial Intelligence (AI) large language models (LLM) are tools capable of generating human-like text responses to user queries across topics. The use of these language models in various medical contexts is currently being studied. However, the performance and content quality of these language models have not been evaluated in specific medical fields. AIMS AND OBJECTIVES This study aimed to compare the performance of AI LLMs ChatGPT, Gemini and Copilot in providing information to parents about chronic kidney diseases (CKD) and compare the information accuracy and quality with that of a reference source. METHODS In this study, 40 frequently asked questions about CKD were identified. The accuracy and quality of the answers were evaluated with reference to the Kidney Disease: Improving Global Outcomes guidelines. The accuracy of the responses generated by LLMs was assessed using F1, precision and recall scores. The quality of the responses was evaluated using a five-point global quality score (GQS). RESULTS ChatGPT and Gemini achieved high F1 scores of 0.89 and 1, respectively, in the diagnosis and lifestyle categories, demonstrating significant success in generating accurate responses. Furthermore, ChatGPT and Gemini were successful in generating accurate responses with high precision values in the diagnosis and lifestyle categories. In terms of recall values, all LLMs exhibited strong performance in the diagnosis, treatment and lifestyle categories. Average GQ scores for the responses generated were 3.46 ± 0.55, 1.93 ± 0.63 and 2.02 ± 0.69 for Gemini, ChatGPT 3.5 and Copilot, respectively. In all categories, Gemini performed better than ChatGPT and Copilot. CONCLUSION Although LLMs provide parents with high-accuracy information about CKD, their use is limited compared with that of a reference source. The limitations in the performance of LLMs can lead to misinformation and potential misinterpretations. Therefore, patients and parents should exercise caution when using these models.
Collapse
Affiliation(s)
- Rüya Naz
- Bursa Yüksek Ihtisas Research and Training Hospital, University of Health Sciences, Bursa, Turkey
| | - Okan Akacı
- Clinic of Pediatric Nephrology, Bursa Yüksek Ihtisas Research and Training Hospital, University of Health Sciences, Bursa, Turkey
| | - Hakan Erdoğan
- Clinic of Pediatric Nephrology, Bursa City Hospital, Bursa, Turkey
| | - Ayfer Açıkgöz
- Department of Pediatric Nursing, Faculty of Health Sciences, Eskişehir Osmangazi University, Eskişehir, Turkey
| |
Collapse
|
3
|
Jo MH, Kim MJ, Oh HK, Choi MJ, Shin HR, Lee TG, Ahn HM, Kim DW, Kang SB. Communicative competence of generative artificial intelligence in responding to patient queries about colorectal cancer surgery. Int J Colorectal Dis 2024; 39:94. [PMID: 38902500 PMCID: PMC11189990 DOI: 10.1007/s00384-024-04670-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/13/2024] [Indexed: 06/22/2024]
Abstract
PURPOSE To examine the ability of generative artificial intelligence (GAI) to answer patients' questions regarding colorectal cancer (CRC). METHODS Ten clinically relevant questions about CRC were selected from top-rated hospitals' websites and patient surveys and presented to three GAI tools (Chatbot Generative Pre-Trained Transformer [GPT-4], Google Bard, and CLOVA X). Their responses were compared with answers from the CRC information book. Response evaluation was performed by two groups, each consisting of five healthcare professionals (HCP) and patients. Each question was scored on a 1-5 Likert scale based on four evaluation criteria (maximum score, 20 points/question). RESULTS In an analysis including only HCPs, the information book scored 11.8 ± 1.2, GPT-4 scored 13.5 ± 1.1, Google Bard scored 11.5 ± 0.7, and CLOVA X scored 12.2 ± 1.4 (P = 0.001). The score of GPT-4 was significantly higher than those of the information book (P = 0.020) and Google Bard (P = 0.001). In an analysis including only patients, the information book scored 14.1 ± 1.4, GPT-4 scored 15.2 ± 1.8, Google Bard scored 15.5 ± 1.8, and CLOVA X scored 14.4 ± 1.8, without significant differences (P = 0.234). When both groups of evaluators were included, the information book scored 13.0 ± 0.9, GPT-4 scored 14.4 ± 1.2, Google Bard scored 13.5 ± 1.0, and CLOVA X scored 13.3 ± 1.5 (P = 0.070). CONCLUSION The three GAIs demonstrated similar or better communicative competence than the information book regarding questions related to CRC surgery in Korean. If high-quality medical information provided by GAI is supervised properly by HCPs and published as an information book, it could be helpful for patients to obtain accurate information and make informed decisions.
Collapse
Affiliation(s)
- Min Hyeong Jo
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
| | - Min-Jun Kim
- Department of Surgery, Seoul National University College of Medicine, Seoul, South Korea
| | - Heung-Kwon Oh
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea.
- Department of Surgery, Seoul National University College of Medicine, Seoul, South Korea.
| | - Mi Jeong Choi
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
| | - Hye-Rim Shin
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
| | - Tae-Gyun Lee
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
| | - Hong-Min Ahn
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
| | - Duck-Woo Kim
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
- Department of Surgery, Seoul National University College of Medicine, Seoul, South Korea
| | - Sung-Bum Kang
- Department of Surgery, Seoul National University Bundang Hospital, 300 Gumi-dong Bundang-gu, Seongnam-si, Gyeonggi-do, 13620, South Korea
- Department of Surgery, Seoul National University College of Medicine, Seoul, South Korea
| |
Collapse
|
4
|
Durmaz Engin C, Karatas E, Ozturk T. Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity. CHILDREN (BASEL, SWITZERLAND) 2024; 11:750. [PMID: 38929329 PMCID: PMC11202218 DOI: 10.3390/children11060750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 06/02/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
BACKGROUND Large language models (LLMs) are becoming increasingly important as they are being used more frequently for providing medical information. Our aim is to evaluate the effectiveness of electronic artificial intelligence (AI) large language models (LLMs), such as ChatGPT-4, BingAI, and Gemini in responding to patient inquiries about retinopathy of prematurity (ROP). METHODS The answers of LLMs for fifty real-life patient inquiries were assessed using a 5-point Likert scale by three ophthalmologists. The models' responses were also evaluated for reliability with the DISCERN instrument and the EQIP framework, and for readability using the Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), and Coleman-Liau Index. RESULTS ChatGPT-4 outperformed BingAI and Gemini, scoring the highest with 5 points in 90% (45 out of 50) and achieving ratings of "agreed" or "strongly agreed" in 98% (49 out of 50) of responses. It led in accuracy and reliability with DISCERN and EQIP scores of 63 and 72.2, respectively. BingAI followed with scores of 53 and 61.1, while Gemini was noted for the best readability (FRE score of 39.1) but lower reliability scores. Statistically significant performance differences were observed particularly in the screening, diagnosis, and treatment categories. CONCLUSION ChatGPT-4 excelled in providing detailed and reliable responses to ROP-related queries, although its texts were more complex. All models delivered generally accurate information as per DISCERN and EQIP assessments.
Collapse
Affiliation(s)
- Ceren Durmaz Engin
- Department of Ophthalmology, Izmir Democracy University, Buca Seyfi Demirsoy Education and Research Hospital, Izmir 35390, Turkey
- Department of Biomedical Technologies, Faculty of Engineering, Dokuz Eylul University, Izmir 35390, Turkey
| | - Ezgi Karatas
- Department of Ophthalmology, Agri Ibrahim Cecen University, Agri 04200, Turkey;
| | - Taylan Ozturk
- Department of Ophthalmology, Izmir Tinaztepe University, Izmir 35400, Turkey;
| |
Collapse
|
5
|
Kooraki S, Hosseiny M, Jalili MH, Rahsepar AA, Imanzadeh A, Kim GH, Hassani C, Abtin F, Moriarty JM, Bedayat A. Evaluation of ChatGPT-Generated Educational Patient Pamphlets for Common Interventional Radiology Procedures. Acad Radiol 2024:S1076-6332(24)00307-6. [PMID: 38839458 DOI: 10.1016/j.acra.2024.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 05/10/2024] [Accepted: 05/13/2024] [Indexed: 06/07/2024]
Abstract
RATIONALE AND OBJECTIVES This study aimed to evaluate the accuracy and reliability of educational patient pamphlets created by ChatGPT, a large language model, for common interventional radiology (IR) procedures. METHODS AND MATERIALS Twenty frequently performed IR procedures were selected, and five users were tasked to independently request ChatGPT to generate educational patient pamphlets for each procedure using identical commands. Subsequently, two independent radiologists assessed the content, quality, and accuracy of the pamphlets. The review focused on identifying potential errors, inaccuracies, the consistency of pamphlets. RESULTS In a thorough analysis of the education pamphlets, we identified shortcomings in 30% (30/100) of pamphlets, with a total of 34 specific inaccuracies, including missing information about sedation for the procedure (10/34), inaccuracies related to specific procedural-related complications (8/34). A key-word co-occurrence network showed consistent themes within each group of pamphlets, while a line-by-line comparison at the level of users and across different procedures showed statistically significant inconsistencies (P < 0.001). CONCLUSION ChatGPT-generated education pamphlets demonstrated potential clinical relevance and fairly consistent terminology; however, the pamphlets were not entirely accurate and exhibited some shortcomings and inter-user structural variabilities. To ensure patient safety, future improvements and refinements in large language models are warranted, while maintaining human supervision and expert validation.
Collapse
Affiliation(s)
- Soheil Kooraki
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA.
| | - Melina Hosseiny
- Department of Radiology, University of California, San Diego (UCSD), San Diego, CA.
| | - Mohamamd H Jalili
- Department of radiology and biomedical imaging, Yale New Haven Health, Bridgeport Hospital, CT.
| | - Amir Ali Rahsepar
- Department of Radiology, Feinberg School of Medicine, Northwestern University, Chicago, IL.
| | - Amir Imanzadeh
- Department of Radiology, University of California, Irvine (UCI), Irvine, CA.
| | - Grace Hyun Kim
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA.
| | - Cameron Hassani
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA.
| | - Fereidoun Abtin
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA.
| | - John M Moriarty
- Department of Radiological Sciences, Division of Interventional Radiology, David Geffen School of Medicine at UCLA, Los Angeles, CA.
| | - Arash Bedayat
- Department of Radiological Sciences, David Geffen School of Medicine, University of California, Los Angeles (UCLA), Los Angeles, CA.
| |
Collapse
|
6
|
Daraqel B, Wafaie K, Mohammed H, Cao L, Mheissen S, Liu Y, Zheng L. The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard. Am J Orthod Dentofacial Orthop 2024; 165:652-662. [PMID: 38493370 DOI: 10.1016/j.ajodo.2024.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/01/2024] [Accepted: 01/01/2024] [Indexed: 03/18/2024]
Abstract
INTRODUCTION This study aimed to evaluate and compare the performance of 2 artificial intelligence (AI) models, Chat Generative Pretrained Transformer-3.5 (ChatGPT-3.5; OpenAI, San Francisco, Calif) and Google Bidirectional Encoder Representations from Transformers (Google Bard; Bard Experiment, Google, Mountain View, Calif), in terms of response accuracy, completeness, generation time, and response length when answering general orthodontic questions. METHODS A team of orthodontic specialists developed a set of 100 questions in 10 orthodontic domains. One author submitted the questions to both ChatGPT and Google Bard. The AI-generated responses from both models were randomly assigned into 2 forms and sent to 5 blinded and independent assessors. The quality of AI-generated responses was evaluated using a newly developed tool for accuracy of information and completeness. In addition, response generation time and length were recorded. RESULTS The accuracy and completeness of responses were high in both AI models. The median accuracy score was 9 (interquartile range [IQR]: 8-9) for ChatGPT and 8 (IQR: 8-9) for Google Bard (Median difference: 1; P <0.001). The median completeness score was similar in both models, with 8 (IQR: 8-9) for ChatGPT and 8 (IQR: 7-9) for Google Bard. The odds of accuracy and completeness were higher by 31% and 23% in ChatGPT than in Google Bard. Google Bard's response generation time was significantly shorter than that of ChatGPT by 10.4 second/question. However, both models were similar in terms of response length generation. CONCLUSIONS Both ChatGPT and Google Bard generated responses were rated with a high level of accuracy and completeness to the posed general orthodontic questions. However, acquiring answers was generally faster using the Google Bard model.
Collapse
Affiliation(s)
- Baraa Daraqel
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China; Oral Health Research and Promotion Unit, Al-Quds University, Jerusalem, Palestine.
| | - Khaled Wafaie
- Department of Orthodontics, Faculty of Dentistry, First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | | | - Li Cao
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | | | - Yang Liu
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | - Leilei Zheng
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China.
| |
Collapse
|
7
|
Jedrzejczak WW, Kochanek K. Comparison of the Audiological Knowledge of Three Chatbots: ChatGPT, Bing Chat, and Bard. Audiol Neurootol 2024:1-7. [PMID: 38710158 DOI: 10.1159/000538983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/15/2024] [Indexed: 05/08/2024] Open
Abstract
INTRODUCTION The purpose of this study was to evaluate three chatbots - OpenAI ChatGPT, Microsoft Bing Chat (currently Copilot), and Google Bard (currently Gemini) - in terms of their responses to a defined set of audiological questions. METHODS Each chatbot was presented with the same 10 questions. The authors rated the responses on a Likert scale ranging from 1 to 5. Additional features, such as the number of inaccuracies or errors and the provision of references, were also examined. RESULTS Most responses given by all three chatbots were rated as satisfactory or better. However, all chatbots generated at least a few errors or inaccuracies. ChatGPT achieved the highest overall score, while Bard was the worst. Bard was also the only chatbot unable to provide a response to one of the questions. ChatGPT was the only chatbot that did not provide information about its sources. CONCLUSIONS Chatbots are an intriguing tool that can be used to access basic information in a specialized area like audiology. Nevertheless, one needs to be careful, as correct information is not infrequently mixed in with errors that are hard to pick up unless the user is well versed in the field.
Collapse
Affiliation(s)
- W Wiktor Jedrzejczak
- Institute of Physiology and Pathology of Hearing, Warsaw, Poland
- World Hearing Center, Kajetany, Poland
| | - Krzysztof Kochanek
- Institute of Physiology and Pathology of Hearing, Warsaw, Poland
- World Hearing Center, Kajetany, Poland
| |
Collapse
|
8
|
Freire Y, Santamaría Laorden A, Orejas Pérez J, Gómez Sánchez M, Díaz-Flores García V, Suárez A. ChatGPT performance in prosthodontics: Assessment of accuracy and repeatability in answer generation. J Prosthet Dent 2024; 131:659.e1-659.e6. [PMID: 38310063 DOI: 10.1016/j.prosdent.2024.01.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/17/2024] [Accepted: 01/18/2024] [Indexed: 02/05/2024]
Abstract
STATEMENT OF PROBLEM The artificial intelligence (AI) software program ChatGPT is based on large language models (LLMs) and is widely accessible. However, in prosthodontics, little is known about its performance in generating answers. PURPOSE The purpose of this study was to determine the performance of ChatGPT in generating answers about removable dental prostheses (RDPs) and tooth-supported fixed dental prostheses (FDPs). MATERIAL AND METHODS Thirty short questions were designed about RDPs and tooth-supported FDP, and 30 answers were generated for each of the questions using ChatGPT-4 in October 2023. The 900 generated answers were independently graded by experts using a 3-point Likert scale. The relative frequency and absolute percentage of answers were described. Accuracy was assessed using the Wald binomial method, while repeatability was evaluated using percentage agreement, Brennan and Prediger coefficient, Conger generalized Cohen kappa, Fleiss kappa, Gwet AC, and Krippendorff alpha methods. Confidence intervals were set at 95%. Statistical analysis was performed using the STATA software program. RESULTS The performance of ChatGPT in generating answers related to RDP and tooth-supported FDP was limited. The answers showed a reliability of 25.6%, with a confidence range between 22.9% and 28.6%. The repeatability ranged from substantial to moderate. CONCLUSIONS The results show that currently ChatGPT has limited ability to generate answers related to RDPs and tooth-supported FDPs. Therefore, ChatGPT cannot replace a dentist, and, if professionals were to use it, they should be aware of its limitations.
Collapse
Affiliation(s)
- Yolanda Freire
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Andrea Santamaría Laorden
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Jaime Orejas Pérez
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Margarita Gómez Sánchez
- Assistant Professor, Vice Dean of Dentistry, Department of Pre-Clinic Dentistry and Clinical Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| | - Víctor Díaz-Flores García
- Assistant Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain.
| | - Ana Suárez
- Associate Professor, Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, European University of Madrid (UEM), Madrid, Spain
| |
Collapse
|
9
|
Ray PP. Advancing AI in rheumatology: critical reflections and proposals for future research using large language models. Rheumatol Int 2024; 44:573-574. [PMID: 37891327 DOI: 10.1007/s00296-023-05488-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 10/03/2023] [Indexed: 10/29/2023]
Affiliation(s)
- Partha Pratim Ray
- Department of Computer Applications, Sikkim University, 6th Mile, PO-Tadong, Gangtok, 737102, Sikkim, India.
| |
Collapse
|
10
|
Venerito V, Gupta L. Large language models: rheumatologists' newest colleagues? Nat Rev Rheumatol 2024; 20:75-76. [PMID: 38177451 DOI: 10.1038/s41584-023-01070-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Vincenzo Venerito
- Rheumatology Unit, Department of Precision and Regenerative Medicine and Ionian Area (DiMePRe-J), University of Bari Aldo Moro, Bari, Italy
| | - Latika Gupta
- Department of Rheumatology, Royal Wolverhampton Hospitals NHS Trust, Wolverhampton, UK.
- Division of Musculoskeletal and Dermatological Sciences, Centre for Musculoskeletal Research, School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK.
| |
Collapse
|