1
|
Fiedler AK, Zhang K, Lal TS, Jiang X, Fraser SM. Generative Pre-trained Transformer for Pediatric Stroke Research: A Pilot Study. Pediatr Neurol 2024; 160:54-59. [PMID: 39191085 DOI: 10.1016/j.pediatrneurol.2024.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/25/2024] [Accepted: 07/02/2024] [Indexed: 08/29/2024]
Abstract
BACKGROUND Pediatric stroke is an important cause of morbidity in children. Although research can be challenging, large amounts of data have been captured through collaborative efforts in the International Pediatric Stroke Study (IPSS). This study explores the use of an advanced artificial intelligence program, the Generative Pre-trained Transformer (GPT), to enter pediatric stroke data into the IPSS. METHODS The most recent 50 clinical notes of patients with ischemic stroke or cerebral venous sinus thrombosis at the UTHealth Pediatric Stroke Clinic were deidentified. Domain-specific prompts were engineered for an offline artificial intelligence program (GPT) to answer IPSS questions. Responses from GPT were compared with the human rater. Percent agreement was assessed across 50 patients for each of the 114 queries developed from the IPSS database outcome questionnaire. RESULTS GPT demonstrated strong performance on several questions but showed variability overall. In its early iterations it was able to match human judgment occasionally with an accuracy score of 1.00 (n = 20, 17.5%), but it scored as low as 0.26 in some patients. Prompts were adjusted in four subsequent iterations to increase accuracy. In its fourth iteration, agreement was 93.6%, with a maximum agreement of 100% and minimum of 62%. Of 2400 individual items assessed, our model entered 2247 (93.6%) correctly and 153 (6.4%) incorrectly. CONCLUSIONS Although our tailored generative model with domain-specific prompt engineering and ontological guidance shows promise for research applications, further refinement is needed to enhance its accuracy. It cannot enter data entirely independently, but it can be employed in tandem with human oversight contributing to a collaborative approach that reduces overall effort.
Collapse
Affiliation(s)
- Anna K Fiedler
- Division of Child Neurology, Department of Pediatrics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Kai Zhang
- Department of Health Data Science and Artificial Intelligence, McWilliams School of Biomedical Informatics at UTHealth Houston, Houston, Texas; UTHealth Houston Institute of Stroke and Cerebrovascular Diseases, Houston, Texas
| | - Tia S Lal
- UTHealth Houston Institute of Stroke and Cerebrovascular Diseases, Houston, Texas
| | - Xiaoqian Jiang
- Department of Health Data Science and Artificial Intelligence, McWilliams School of Biomedical Informatics at UTHealth Houston, Houston, Texas; UTHealth Houston Institute of Stroke and Cerebrovascular Diseases, Houston, Texas
| | - Stuart M Fraser
- Division of Child Neurology, Department of Pediatrics, The University of Texas Health Science Center at Houston, Houston, Texas; UTHealth Houston Institute of Stroke and Cerebrovascular Diseases, Houston, Texas.
| |
Collapse
|
2
|
Shamil E, Ko TK, Fan KS, Schuster-Bruce J, Jaafar M, Khwaja S, Eynon-Lewis N, D'Souza A, Andrews P. Assessing the Quality and Readability of Online Patient Information: ENT UK Patient Information e-Leaflets versus Responses by a Generative Artificial Intelligence. Facial Plast Surg 2024. [PMID: 39260421 DOI: 10.1055/a-2413-3675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024] Open
Abstract
BACKGROUND The evolution of artificial intelligence has introduced new ways to disseminate health information, including natural language processing models like ChatGPT. However, the quality and readability of such digitally generated information remains understudied. This study is the first to compare the quality and readability of digitally generated health information against leaflets produced by professionals. METHODOLOGY Patient information leaflets from five ENT UK leaflets and their corresponding ChatGPT responses were extracted from the Internet. Assessors with various degrees of medical knowledge evaluated the content using the Ensuring Quality Information for Patients (EQIP) tool and readability tools including the Flesch-Kincaid Grade Level (FKGL). Statistical analysis was performed to identify differences between leaflets, assessors, and sources of information. RESULTS ENT UK leaflets were of moderate quality, scoring a median EQIP of 23. Statistically significant differences in overall EQIP score were identified between ENT UK leaflets, but ChatGPT responses were of uniform quality. Nonspecialist doctors rated the highest EQIP scores, while medical students scored the lowest. The mean readability of ENT UK leaflets was higher than ChatGPT responses. The information metrics of ENT UK leaflets were moderate and varied between topics. Equivalent ChatGPT information provided comparable content quality, but with reduced readability. CONCLUSION ChatGPT patient information and professionally produced leaflets had comparable content, but large language model content required a higher reading age. With the increasing use of online health resources, this study highlights the need for a balanced approach that considers both the quality and readability of patient education materials.
Collapse
Affiliation(s)
- Eamon Shamil
- The Royal National ENT Hospital, University College London Hospitals NHS Foundation Trust, London, England, United Kingdom
| | - Tsz Ki Ko
- Royal Stoke University Hospital, United Kingdom
| | - Ka Siu Fan
- Royal Surrey County Hospital, Guildford, Surrey, United Kingdom
| | - James Schuster-Bruce
- Department of ENT, Kings College Hospital Foundation Trust, London, England, United Kingdom
| | - Mustafa Jaafar
- UCL Artificial Intelligence Centre for Doctoral Training, London, England, United Kingdom
| | - Sadie Khwaja
- Department of ENT, Manchester University NHS Foundation Trust, England, United Kingdom
| | | | - Alwyn D'Souza
- Institute of Medical Sciences, Canterbury Christ Church University, England, United Kingdom
| | - Peter Andrews
- The Royal National ENT Hospital, University College London Hospitals NHS Foundation Trust, London, England, United Kingdom
| |
Collapse
|
3
|
Lechien JR. Generative AI and Otolaryngology-Head & Neck Surgery. Otolaryngol Clin North Am 2024; 57:753-765. [PMID: 38839556 DOI: 10.1016/j.otc.2024.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
The increasing development of artificial intelligence (AI) generative models in otolaryngology-head and neck surgery will progressively change our practice. Practitioners and patients have access to AI resources, improving information, knowledge, and practice of patient care. This article summarizes the currently investigated applications of AI generative models, particularly Chatbot Generative Pre-trained Transformer, in otolaryngology-head and neck surgery.
Collapse
Affiliation(s)
- Jérôme R Lechien
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France; Division of Laryngology and Broncho-esophagology, Department of Otolaryngology-Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium; Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, Paris Saclay University, Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Paris, France; Department of Otorhinolaryngology and Head and Neck Surgery, CHU Saint-Pierre, Brussels, Belgium.
| |
Collapse
|
4
|
Peters M, Leclercq M, Yanni A, Vanden Eynden X, Martin L, Vanden Haute N, Tancredi S, De Passe C, Boutremans E, Lechien J, Dequanter D. ChatGPT and trainee performances in the management of maxillofacial patients. JOURNAL OF STOMATOLOGY, ORAL AND MAXILLOFACIAL SURGERY 2024:102090. [PMID: 39332706 DOI: 10.1016/j.jormas.2024.102090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/24/2024] [Accepted: 09/22/2024] [Indexed: 09/29/2024]
Abstract
INTRODUCTION ChatGPT is an artificial intelligence based large language model with the ability to generate human-like response to text input, its performance has already been the subject of several studies in different fields. The aim of this study was to evaluate the performance of ChatGPT in the management of maxillofacial clinical cases. MATERIALS AND METHODS A total of 38 clinical cases consulting at the Stomatology-Maxillofacial Surgery Department were prospectively recruited and presented to ChatGPT, which was interrogated for diagnosis, differential diagnosis, management and treatment. The performance of trainees and ChatGPT was compared by three blinded board-certified maxillofacial surgeons using the AIPI score. RESULTS The average total AIPI score assigned to the practitioners was 18.71 and 16.39 to ChatGPT, significantly lower (p < 0.001). According to the experts, ChatGPT was significantly less effective for diagnosis and treatment (p < 0.001). Following two of the three experts, ChatGPT was significantly less effective in considering patient data (p = 0.001) and suggesting additional examinations (p < 0.0001). The primary diagnosis proposed by ChatGPT was judged by the experts as not plausible and /or incomplete in 2.63 % to 18 % of the cases, the additional examinations were associated with inadequate examinations in 2.63 %, to 21.05 % of the cases and proposed an association of pertinent, but incomplete therapeutic findings in 18.42 % to 47.37 % of the cases, while the therapeutic findings were considered pertinent, necessary and inadequate in 18.42 % of cases. CONCLUSIONS ChatGPT appears less efficient in diagnosis, the selection of the most adequate additional examination and the proposition of pertinent and necessary therapeutic approaches.
Collapse
Affiliation(s)
- Mélissa Peters
- Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium.
| | - Maxime Leclercq
- Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium
| | - Antoine Yanni
- Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium
| | - Xavier Vanden Eynden
- Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium
| | - Lalmand Martin
- Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium
| | - Noémie Vanden Haute
- Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium
| | - Szonja Tancredi
- Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium
| | - Céline De Passe
- Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium
| | - Edward Boutremans
- Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium
| | - Jerome Lechien
- Faculty of Medicine, Department of Human Anatomy and Experimental Oncology UMONS, Mons, Belgium; Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, School of Medicine, UFR Simone Veil, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France; Department of Otorhinolaryngology and Head and Neck Surgery, CHU Saint-Pierre, Brussels, Belgium; Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France; Young Confederation of the European Oto-Rhino-Laryngological Head and Neck Surgery Societies (Y-CEORLHNS), Dublin, Ireland; Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium
| | - Didier Dequanter
- Department of Stomatology, Oral & Maxillofacial Surgery, CHU Saint Pierre, Brussels, Belgium; Faculty of Medicine, Department of Human Anatomy and Experimental Oncology UMONS, Mons, Belgium
| |
Collapse
|
5
|
Tomo S, Lechien JR, Bueno HS, Cantieri-Debortoli DF, Simonato LE. Accuracy and consistency of ChatGPT-3.5 and - 4 in providing differential diagnoses in oral and maxillofacial diseases: a comparative diagnostic performance analysis. Clin Oral Investig 2024; 28:544. [PMID: 39316174 DOI: 10.1007/s00784-024-05939-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 09/14/2024] [Indexed: 09/25/2024]
Abstract
OBJECTIVE To investigate the performance of ChatGPT in the differential diagnosis of oral and maxillofacial diseases. METHODS Thirty-seven oral and maxillofacial lesions findings were presented to ChatGPT-3.5 and - 4, 18 dental surgeons trained in oral medicine/pathology (OMP), 23 general dental surgeons (DDS), and 16 dental students (DS) for differential diagnosis. Additionally, a group of 15 general dentists was asked to describe 11 cases to ChatGPT versions. The ChatGPT-3.5, -4, and human primary and alternative diagnoses were rated by 2 independent investigators with a 4 Likert-Scale. The consistency of ChatGPT-3.5 and - 4 was evaluated with regenerated inputs. RESULTS Moderate consistency of outputs was observed for ChatGPT-3.5 and - 4 to provide primary (κ = 0.532 and κ = 0.533 respectively) and alternative (κ = 0.337 and κ = 0.367 respectively) hypotheses. The mean of correct diagnoses was 64.86% for ChatGPT-3.5, 80.18% for ChatGPT-4, 86.64% for OMP, 24.32% for DDS, and 16.67% for DS. The mean correct primary hypothesis rates were 45.95% for ChatGPT-3.5, 61.80% for ChatGPT-4, 82.28% for OMP, 22.72% for DDS, and 15.77% for DS. The mean correct diagnosis rate for ChatGPT-3.5 with standard descriptions was 64.86%, compared to 45.95% with participants' descriptions. For ChatGPT-4, the mean was 80.18% with standard descriptions and 61.80% with participant descriptions. CONCLUSION ChatGPT-4 demonstrates an accuracy comparable to specialists to provide differential diagnosis for oral and maxillofacial diseases. Consistency of ChatGPT to provide diagnostic hypotheses for oral diseases cases is moderate, representing a weakness for clinical application. The quality of case documentation and descriptions impacts significantly on the performance of ChatGPT. CLINICAL RELEVANCE General dentists, dental students and specialists in oral medicine and pathology may benefit from ChatGPT-4 as an auxiliary method to define differential diagnosis for oral and maxillofacial lesions, but its accuracy is dependent on precise case descriptions.
Collapse
Affiliation(s)
- Saygo Tomo
- Department of Pathology, School of Dentistry, University of São Paulo, Av. Professor Lineu Prestes 2227, São Paulo, CEP 05508-000, Brazil.
| | - Jérôme R Lechien
- Research Committee of Young-Otolaryngologists of the International Federations of Oto-rhino- laryngological Societies (YO-IFOS), Paris, France
- Department of Otorhinolaryngology and Head and Neck Surgery, CHU de Bruxelles, CHU Saint-Pierre, Brussels, Belgium
- Department of Otorhinolaryngology and Head and Neck Surgery, School of Medicine, Foch Hospital, UFR Simone Veil, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France
- Department of Surgery, Faculty of Medicine, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium
| | | | | | - Luciana Estevam Simonato
- Dental School, University Brasil, Fernandópolis, Brazil
- Medical School, University Brasil, Fernandópolis, Brazil
- Instituto Científico e Tecnológico, Programas de Bioengenharia e Ciências Ambientais, Universidade Brasil, Fernandópolis, Brazil
| |
Collapse
|
6
|
Villarreal-Espinosa JB, Berreta RS, Allende F, Garcia JR, Ayala S, Familiari F, Chahla J. Accuracy assessment of ChatGPT responses to frequently asked questions regarding anterior cruciate ligament surgery. Knee 2024; 51:84-92. [PMID: 39241674 DOI: 10.1016/j.knee.2024.08.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/21/2024] [Accepted: 08/14/2024] [Indexed: 09/09/2024]
Abstract
BACKGROUND The emergence of artificial intelligence (AI) has allowed users to have access to large sources of information in a chat-like manner. Thereby, we sought to evaluate ChatGPT-4 response's accuracy to the 10 patient most frequently asked questions (FAQs) regarding anterior cruciate ligament (ACL) surgery. METHODS A list of the top 10 FAQs pertaining to ACL surgery was created after conducting a search through all Sports Medicine Fellowship Institutions listed on the Arthroscopy Association of North America (AANA) and American Orthopaedic Society of Sports Medicine (AOSSM) websites. A Likert scale was used to grade response accuracy by two sports medicine fellowship-trained surgeons. Cohen's kappa was used to assess inter-rater agreement. Reproducibility of the responses over time was also assessed. RESULTS Five of the 10 responses received a 'completely accurate' grade by two-fellowship trained surgeons with three additional replies receiving a 'completely accurate' status by at least one. Moreover, inter-rater reliability accuracy assessment revealed a moderate agreement between fellowship-trained attending physicians (weighted kappa = 0.57, 95% confidence interval 0.15-0.99). Additionally, 80% of the responses were reproducible over time. CONCLUSION ChatGPT can be considered an accurate additional tool to answer general patient questions regarding ACL surgery. None the less, patient-surgeon interaction should not be deferred and must continue to be the driving force for information retrieval. Thus, the general recommendation is to address any questions in the presence of a qualified specialist.
Collapse
Affiliation(s)
| | | | - Felicitas Allende
- Department of Orthopedics, Rush University Medical Center, Chicago, IL, USA
| | - José Rafael Garcia
- Department of Orthopedics, Rush University Medical Center, Chicago, IL, USA
| | - Salvador Ayala
- Department of Orthopedics, Rush University Medical Center, Chicago, IL, USA
| | | | - Jorge Chahla
- Department of Orthopedics, Rush University Medical Center, Chicago, IL, USA.
| |
Collapse
|
7
|
Bellamkonda N, Farlow JL, Haring CT, Sim MW, Seim NB, Cannon RB, Monroe MM, Agrawal A, Rocco JW, McCrary HC. Evaluating the Accuracy of ChatGPT in Common Patient Questions Regarding HPV+ Oropharyngeal Carcinoma. Ann Otol Rhinol Laryngol 2024; 133:814-819. [PMID: 39075853 DOI: 10.1177/00034894241259137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/31/2024]
Abstract
OBJECTIVES Large language model (LLM)-based chatbots such as ChatGPT have been publicly available and increasingly utilized by the general public since late 2022. This study sought to investigate ChatGPT responses to common patient questions regarding Human Papilloma Virus (HPV) positive oropharyngeal cancer (OPC). METHODS This was a prospective, multi-institutional study, with data collected from high volume institutions that perform >50 transoral robotic surgery cases per year. The 100 most recent discussion threads including the term "HPV" on the American Cancer Society's Cancer Survivors Network's Head and Neck Cancer public discussion board were reviewed. The 11 most common questions were serially queried to ChatGPT 3.5; answers were recorded. A survey was distributed to fellowship trained head and neck oncologic surgeons at 3 institutions to evaluate the responses. RESULTS A total of 8 surgeons participated in the study. For questions regarding HPV contraction and transmission, ChatGPT answers were scored as clinically accurate and aligned with consensus in the head and neck surgical oncology community 84.4% and 90.6% of the time, respectively. For questions involving treatment of HPV+ OPC, ChatGPT was clinically accurate and aligned with consensus 87.5% and 91.7% of the time, respectively. For questions regarding the HPV vaccine, ChatGPT was clinically accurate and aligned with consensus 62.5% and 75% of the time, respectively. When asked about circulating tumor DNA testing, only 12.5% of surgeons thought responses were accurate or consistent with consensus. CONCLUSION ChatGPT 3.5 performed poorly with questions involving evolving therapies and diagnostics-thus, caution should be used when using a platform like ChatGPT 3.5 to assess use of advanced technology. Patients should be counseled on the importance of consulting their surgeons to receive accurate and up to date recommendations, and use LLM's to augment their understanding of these important health-related topics.
Collapse
Affiliation(s)
- Nikhil Bellamkonda
- Department of Otolaryngology-Head and Neck Surgery, University of Utah, Salt Lake City, UT, USA
| | - Janice L Farlow
- Department of Otolaryngology-Head and Neck Surgery, Indiana University, Indianapolis, IN, USA
| | - Catherine T Haring
- Department of Otolaryngology-Head and Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Michael W Sim
- Department of Otolaryngology-Head and Neck Surgery, Indiana University, Indianapolis, IN, USA
| | - Nolan B Seim
- Department of Otolaryngology-Head and Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Richard B Cannon
- Department of Otolaryngology-Head and Neck Surgery, University of Utah, Salt Lake City, UT, USA
| | - Marcus M Monroe
- Department of Otolaryngology-Head and Neck Surgery, University of Utah, Salt Lake City, UT, USA
| | - Amit Agrawal
- Department of Otolaryngology-Head and Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - James W Rocco
- Department of Otolaryngology-Head and Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, OH, USA
| | - Hilary C McCrary
- Department of Otolaryngology-Head and Neck Surgery, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
8
|
Alami K, Willemse E, Quiriny M, Lipski S, Laurent C, Donquier V, Digonnet A. Evaluation of ChatGPT-4's Performance in Therapeutic Decision-Making During Multidisciplinary Oncology Meetings for Head and Neck Squamous Cell Carcinoma. Cureus 2024; 16:e68808. [PMID: 39376890 PMCID: PMC11456411 DOI: 10.7759/cureus.68808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/04/2024] [Indexed: 10/09/2024] Open
Abstract
Objectives First reports suggest that artificial intelligence (AI) such as ChatGPT-4 (Open AI, ChatGPT-4, San Francisco, USA) might represent reliable tools for therapeutic decisions in some medical conditions. This study aims to assess the decisional capacity of ChatGPT-4 in patients with head and neck carcinomas, using the multidisciplinary oncology meeting (MOM) and the National Comprehensive Cancer Network (NCCN) decision as references. Methods This retrospective study included 263 patients with squamous cell carcinoma of the oral cavity, oropharynx, hypopharynx, and larynx who were followed at our institution between January 1, 2016, and December 31, 2021. The recommendation of GPT4 for the first- and second-line treatments was compared to the MOM decision and NCCN guidelines. The degrees of agreement were calculated using the Kappa method, which measures the degree of agreement between two evaluators. Results ChatGPT-4 demonstrated a moderate agreement in first-line treatment recommendations (Kappa = 0.48) and a substantial agreement (Kappa = 0.78) in second-line treatment recommendations compared to the decisions from MOM. A substantial agreement with the NCCN guidelines for both first- and second-line treatments was observed (Kappa = 0.72 and 0.66, respectively). The degree of agreement decreased when the decision included gastrostomy, patients over 70, and those with comorbidities. Conclusions The study illustrates that while ChatGPT-4 can significantly support clinical decision-making in oncology by aligning closely with expert recommendations and established guidelines, ongoing enhancements and training are crucial. The findings advocate for the continued evolution of AI tools to better handle the nuanced aspects of patient health profiles, thus broadening their applicability and reliability in clinical practice.
Collapse
Affiliation(s)
- Kenza Alami
- Otolaryngology, Jules Bordet Institute, Bruxelles, BEL
| | | | - Marie Quiriny
- Surgical Oncology, Jules Bordet Institute, Bruxelles, BEL
| | - Samuel Lipski
- Surgical Oncology, Jules Bordet Institute, Bruxelles, BEL
| | - Celine Laurent
- Otolaryngology - Head and Neck Surgery, Hôpital Ambroise-Paré, Mons, BEL
- Otolaryngology - Head and Neck Surgery, Hôpital Universitaire de Bruxelles (HUB) Erasme Hospital, Bruxelles, BEL
| | | | | |
Collapse
|
9
|
Lechien JR, Rameau A. Applications of ChatGPT in Otolaryngology-Head Neck Surgery: A State of the Art Review. Otolaryngol Head Neck Surg 2024; 171:667-677. [PMID: 38716790 DOI: 10.1002/ohn.807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/01/2024] [Accepted: 04/19/2024] [Indexed: 08/28/2024]
Abstract
OBJECTIVE To review the current literature on the application, accuracy, and performance of Chatbot Generative Pre-Trained Transformer (ChatGPT) in Otolaryngology-Head and Neck Surgery. DATA SOURCES PubMED, Cochrane Library, and Scopus. REVIEW METHODS A comprehensive review of the literature on the applications of ChatGPT in otolaryngology was conducted according to Preferred Reporting Items for Systematic Reviews and Meta-analyses statement. CONCLUSIONS ChatGPT provides imperfect patient information or general knowledge related to diseases found in Otolaryngology-Head and Neck Surgery. In clinical practice, despite suboptimal performance, studies reported that the model is more accurate in providing diagnoses, than in suggesting the most adequate additional examinations and treatments related to clinical vignettes or real clinical cases. ChatGPT has been used as an adjunct tool to improve scientific reports (referencing, spelling correction), to elaborate study protocols, or to take student or resident exams reporting several levels of accuracy. The stability of ChatGPT responses throughout repeated questions appeared high but many studies reported some hallucination events, particularly in providing scientific references. IMPLICATIONS FOR PRACTICE To date, most applications of ChatGPT are limited in generating disease or treatment information, and in the improvement of the management of clinical cases. The lack of comparison of ChatGPT performance with other large language models is the main limitation of the current research. Its ability to analyze clinical images has not yet been investigated in otolaryngology although upper airway tract or ear images are an important step in the diagnosis of most common ear, nose, and throat conditions. This review may help otolaryngologists to conceive new applications in further research.
Collapse
Affiliation(s)
- Jérôme R Lechien
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France
- Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium
- Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Paris Saclay University, Paris, France
- Department of Otorhinolaryngology and Head and Neck Surgery, CHU Saint-Pierre, Brussels, Belgium
| | - Anais Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York City, New York, USA
| |
Collapse
|
10
|
Kutbi M. Artificial Intelligence-Based Applications for Bone Fracture Detection Using Medical Images: A Systematic Review. Diagnostics (Basel) 2024; 14:1879. [PMID: 39272664 PMCID: PMC11394268 DOI: 10.3390/diagnostics14171879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 08/19/2024] [Accepted: 08/26/2024] [Indexed: 09/15/2024] Open
Abstract
Artificial intelligence (AI) is making notable advancements in the medical field, particularly in bone fracture detection. This systematic review compiles and assesses existing research on AI applications aimed at identifying bone fractures through medical imaging, encompassing studies from 2010 to 2023. It evaluates the performance of various AI models, such as convolutional neural networks (CNNs), in diagnosing bone fractures, highlighting their superior accuracy, sensitivity, and specificity compared to traditional diagnostic methods. Furthermore, the review explores the integration of advanced imaging techniques like 3D CT and MRI with AI algorithms, which has led to enhanced diagnostic accuracy and improved patient outcomes. The potential of Generative AI and Large Language Models (LLMs), such as OpenAI's GPT, to enhance diagnostic processes through synthetic data generation, comprehensive report creation, and clinical scenario simulation is also discussed. The review underscores the transformative impact of AI on diagnostic workflows and patient care, while also identifying research gaps and suggesting future research directions to enhance data quality, model robustness, and ethical considerations.
Collapse
Affiliation(s)
- Mohammed Kutbi
- College of Computing and Informatics, Saudi Electronic University, Riyadh 13316, Saudi Arabia
| |
Collapse
|
11
|
De Vito A, Colpani A, Moi G, Babudieri S, Calcagno A, Calvino V, Ceccarelli M, Colpani G, d'Ettorre G, Di Biagio A, Farinella M, Falaguasta M, Focà E, Giupponi G, Habed AJ, Isenia WJ, Lo Caputo S, Marchetti G, Modesti L, Mussini C, Nunnari G, Rusconi S, Russo D, Saracino A, Serra PA, Madeddu G. Assessing ChatGPT's Potential in HIV Prevention Communication: A Comprehensive Evaluation of Accuracy, Completeness, and Inclusivity. AIDS Behav 2024; 28:2746-2754. [PMID: 38836986 PMCID: PMC11286632 DOI: 10.1007/s10461-024-04391-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2024] [Indexed: 06/06/2024]
Abstract
With the advancement of artificial intelligence(AI), platforms like ChatGPT have gained traction in different fields, including Medicine. This study aims to evaluate the potential of ChatGPT in addressing questions related to HIV prevention and to assess its accuracy, completeness, and inclusivity. A team consisting of 15 physicians, six members from HIV communities, and three experts in gender and queer studies designed an assessment of ChatGPT. Queries were categorized into five thematic groups: general HIV information, behaviors increasing HIV acquisition risk, HIV and pregnancy, HIV testing, and the prophylaxis use. A team of medical doctors was in charge of developing questions to be submitted to ChatGPT. The other members critically assessed the generated responses regarding level of expertise, accuracy, completeness, and inclusivity. The median accuracy score was 5.5 out of 6, with 88.4% of responses achieving a score ≥ 5. Completeness had a median of 3 out of 3, while the median for inclusivity was 2 out of 3. Some thematic groups, like behaviors associated with HIV transmission and prophylaxis, exhibited higher accuracy, indicating variable performance across different topics. Issues of inclusivity were identified, notably the use of outdated terms and a lack of representation for some communities. ChatGPT demonstrates significant potential in providing accurate information on HIV-related topics. However, while responses were often scientifically accurate, they sometimes lacked the socio-political context and inclusivity essential for effective health communication. This underlines the importance of aligning AI-driven platforms with contemporary health communication strategies and ensuring the balance of accuracy and inclusivity.
Collapse
Affiliation(s)
- Andrea De Vito
- Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, 07100, Italy.
- PhD School in Biomedical Science, Biomedical Science Department, University of Sassari, Sassari, Italy.
| | - Agnese Colpani
- Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, 07100, Italy
| | - Giulia Moi
- Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, 07100, Italy
| | - Sergio Babudieri
- Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, 07100, Italy
| | - Andrea Calcagno
- Unit of Infectious Diseases, Department of Medical Sciences, University of Turin, Torino, Italy
| | - Valeria Calvino
- Associazione Nazionale per la Lotta contro l'AIDS (ANLAIDS), Rome, Italy
| | - Manuela Ceccarelli
- Unit of Infectious Diseases, School of Medicine and Surgery, "Kore" University of Enna, Enna, Italy
| | - Gianmaria Colpani
- Department of Media and Culture Studies, Utrecht University, Utrecht, Netherlands
| | - Gabriella d'Ettorre
- Unit of Infectious Diseases, Department of Public Health and Infectious Diseases, Azienda Policlinico Umberto I, Rome, Italy
| | - Antonio Di Biagio
- Infectious Diseases, San Martino Hospital Genoa, University of Genoa, Genoa, Italy
| | | | - Marco Falaguasta
- Associazione Nazionale per la Lotta contro l'AIDS (ANLAIDS), Padova, Italy
| | - Emanuele Focà
- Unit of Infectious and Tropical Diseases, Department of Clinical and Experimental Sciences, University of Brescia and ASST Spedali Civili di Brescia, Brescia, Italy
| | - Giusi Giupponi
- Lega italiana per la lotta contro l'AIDS (LILA), Brescia, Italy
| | - Adriano José Habed
- Department of Media and Culture Studies, Utrecht University, Utrecht, Netherlands
| | | | - Sergio Lo Caputo
- S.C. Malattie Infettive, Dipartimento di Scienze Mediche e Chirurgiche, University of Foggia, Foggia, Italy
| | - Giulia Marchetti
- Clinic of Infectious Diseases, Department of Health Sciences, ASST Santi Paolo e Carlo, University of Milan, Milan, Italy
| | - Luca Modesti
- Conigli Bianchi, Artivists against Serophobia, Italy
| | | | - Giuseppe Nunnari
- Unit of Infectious Diseases, Department of Clinical and Experimental Medicine, ARNAS Garibaldi Hospital, University of Catania, Catania, Italy
| | - Stefano Rusconi
- Infectious Diseases Unit, Ospedale Civile di Legnano, ASST Ovest Milanese, DIBIC Luigi Sacco, Università degli Studi di Milano, Legnano, 20025, Italy
| | - Daria Russo
- Network Persone Sieropositive (NPS), Rome, Italy
| | - Annalisa Saracino
- Clinic of Infectious Diseases, Department of Precision and Regenerative Medicine and Ionian Area-(DiMePRe-J), University of Bari "Aldo Moro", Bari, Italy
| | - Pier Andrea Serra
- Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy
| | - Giordano Madeddu
- Unit of Infectious Diseases, Department of Medicine, Surgery, and Pharmacy, University of Sassari, Sassari, 07100, Italy
| |
Collapse
|
12
|
Maniaci A, Lazzeroni M, Cozzi A, Fraccaroli F, Gaffuri M, Chiesa-Estomba C, Capaccio P. Can chatbots enhance the management of pediatric sialadenitis in clinical practice? Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08798-4. [PMID: 38955859 DOI: 10.1007/s00405-024-08798-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 06/17/2024] [Indexed: 07/04/2024]
Abstract
OBJECTIVE The purpose of this study was to assess how well ChatGPT, an AI-powered chatbot, performed in helping to manage pediatric sialadenitis and identify when sialendoscopy was necessary. METHODS 49 clinical cases of pediatric sialadenitis were retrospectively reviewed. ChatGPT was given patient data, and it offered differential diagnoses, proposed further tests, and suggested treatments. The decisions made by the treating otolaryngologists were contrasted with the answers provided by ChatGPT. Analysis was done on ChatGPT response consistency and interrater reliability. RESULTS ChatGPT showed 78.57% accuracy in primary diagnosis, and 17.35% of cases were considered likely. On the other hand, otolaryngologists recommended fewer further examinations than ChatGPT (111 vs. 60, p < 0.001). For additional exams, poor agreement was found between ChatGPT and otolaryngologists. Only 28.57% of cases received a pertinent and essential treatment plan via ChatGPT, indicating that the platform's treatment recommendations were frequently lacking. For treatment ratings, judges' interrater reliability was greatest (Kendall's tau = 0.824, p < 0.001). For the most part, ChatGPT's response constancy was high. CONCLUSIONS Although ChatGPT has the potential to correctly diagnose pediatric sialadenitis, there are a number of noteworthy limitations with regard to its ability to suggest further testing and treatment regimens. Before widespread clinical use, more research and confirmation are required. To guarantee that chatbots are utilized properly and effectively to supplement human expertise rather than to replace it, a critical viewpoint is required.
Collapse
Affiliation(s)
- Antonino Maniaci
- Faculty of Medicine and Surgery, University of Enna Kore, 94100, Enna, Italy.
| | - Matteo Lazzeroni
- Department of Otolaryngology and Head and Neck Surgery, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
- Department of Clinical Sciences and Community, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
| | - Anna Cozzi
- Otolaryngology Unit, Santi Paolo e Carlo Hospital, Department of Health Sciences, Università Degli Studi di Milano, 20142, Milan, Italy
| | - Francesca Fraccaroli
- Otolaryngology Unit, Santi Paolo e Carlo Hospital, Department of Health Sciences, Università Degli Studi di Milano, 20142, Milan, Italy
| | - Michele Gaffuri
- Otolaryngology Unit, Santi Paolo e Carlo Hospital, Department of Health Sciences, Università Degli Studi di Milano, 20142, Milan, Italy
| | - Carlos Chiesa-Estomba
- Department of Otolaryngology-Head and Neck Surgery, San Sebastian University Hospital, San Sebastián, Spain
| | - Pasquale Capaccio
- Otolaryngology Unit, Santi Paolo e Carlo Hospital, Department of Health Sciences, Università Degli Studi di Milano, 20142, Milan, Italy
| |
Collapse
|
13
|
Ho RA, Shaari AL, Cowan PT, Yan K. ChatGPT Responses to Frequently Asked Questions on Ménière's Disease: A Comparison to Clinical Practice Guideline Answers. OTO Open 2024; 8:e163. [PMID: 38974175 PMCID: PMC11225079 DOI: 10.1002/oto2.163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Revised: 06/01/2024] [Accepted: 06/08/2024] [Indexed: 07/09/2024] Open
Abstract
Objective Evaluate the quality of responses from Chat Generative Pre-Trained Transformer (ChatGPT) models compared to the answers for "Frequently Asked Questions" (FAQs) from the American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS) Clinical Practice Guidelines (CPG) for Ménière's disease (MD). Study Design Comparative analysis. Setting The AAO-HNS CPG for MD includes FAQs that clinicians can give to patients for MD-related questions. The ability of ChatGPT to properly educate patients regarding MD is unknown. Methods ChatGPT-3.5 and 4.0 were each prompted with 16 questions from the MD FAQs. Each response was rated in terms of (1) comprehensiveness, (2) extensiveness, (3) presence of misleading information, and (4) quality of resources. Readability was assessed using Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES). Results ChatGPT-3.5 was comprehensive in 5 responses whereas ChatGPT-4.0 was comprehensive in 9 (31.3% vs 56.3%, P = .2852). ChatGPT-3.5 and 4.0 were extensive in all responses (P = 1.0000). ChatGPT-3.5 was misleading in 5 responses whereas ChatGPT-4.0 was misleading in 3 (31.3% vs 18.75%, P = .6851). ChatGPT-3.5 had quality resources in 10 responses whereas ChatGPT-4.0 had quality resources in 16 (62.5% vs 100%, P = .0177). AAO-HNS CPG FRES (62.4 ± 16.6) demonstrated an appropriate readability score of at least 60, while both ChatGPT-3.5 (39.1 ± 7.3) and 4.0 (42.8 ± 8.5) failed to meet this standard. All platforms had FKGL means that exceeded the recommended level of 6 or lower. Conclusion While ChatGPT-4.0 had significantly better resource reporting, both models have room for improvement in being more comprehensive, more readable, and less misleading for patients.
Collapse
Affiliation(s)
- Rebecca A. Ho
- Department of Otolaryngology–Head and Neck SurgeryRutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Ariana L. Shaari
- Department of Otolaryngology–Head and Neck SurgeryRutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Paul T. Cowan
- Department of Otolaryngology–Head and Neck SurgeryRutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| | - Kenneth Yan
- Department of Otolaryngology–Head and Neck SurgeryRutgers New Jersey Medical SchoolNewarkNew JerseyUSA
| |
Collapse
|
14
|
Lahat A, Sharif K, Zoabi N, Shneor Patt Y, Sharif Y, Fisher L, Shani U, Arow M, Levin R, Klang E. Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4. J Med Internet Res 2024; 26:e54571. [PMID: 38935937 PMCID: PMC11240076 DOI: 10.2196/54571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 02/02/2024] [Accepted: 04/29/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND Artificial intelligence, particularly chatbot systems, is becoming an instrumental tool in health care, aiding clinical decision-making and patient engagement. OBJECTIVE This study aims to analyze the performance of ChatGPT-3.5 and ChatGPT-4 in addressing complex clinical and ethical dilemmas, and to illustrate their potential role in health care decision-making while comparing seniors' and residents' ratings, and specific question types. METHODS A total of 4 specialized physicians formulated 176 real-world clinical questions. A total of 8 senior physicians and residents assessed responses from GPT-3.5 and GPT-4 on a 1-5 scale across 5 categories: accuracy, relevance, clarity, utility, and comprehensiveness. Evaluations were conducted within internal medicine, emergency medicine, and ethics. Comparisons were made globally, between seniors and residents, and across classifications. RESULTS Both GPT models received high mean scores (4.4, SD 0.8 for GPT-4 and 4.1, SD 1.0 for GPT-3.5). GPT-4 outperformed GPT-3.5 across all rating dimensions, with seniors consistently rating responses higher than residents for both models. Specifically, seniors rated GPT-4 as more beneficial and complete (mean 4.6 vs 4.0 and 4.6 vs 4.1, respectively; P<.001), and GPT-3.5 similarly (mean 4.1 vs 3.7 and 3.9 vs 3.5, respectively; P<.001). Ethical queries received the highest ratings for both models, with mean scores reflecting consistency across accuracy and completeness criteria. Distinctions among question types were significant, particularly for the GPT-4 mean scores in completeness across emergency, internal, and ethical questions (4.2, SD 1.0; 4.3, SD 0.8; and 4.5, SD 0.7, respectively; P<.001), and for GPT-3.5's accuracy, beneficial, and completeness dimensions. CONCLUSIONS ChatGPT's potential to assist physicians with medical issues is promising, with prospects to enhance diagnostics, treatments, and ethics. While integration into clinical workflows may be valuable, it must complement, not replace, human expertise. Continued research is essential to ensure safe and effective implementation in clinical environments.
Collapse
Affiliation(s)
- Adi Lahat
- Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel
- Department of Gastroenterology, Samson Assuta Ashdod Medical Center, Affiliated with Ben Gurion University of the Negev, Be'er Sheva, Israel
| | - Kassem Sharif
- Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Narmin Zoabi
- Department of Gastroenterology, Chaim Sheba Medical Center, Affiliated with Tel Aviv University, Ramat Gan, Israel
| | | | - Yousra Sharif
- Department of Internal Medicine C, Hadassah Medical Center, Jerusalem, Israel
| | - Lior Fisher
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Uria Shani
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Mohamad Arow
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Roni Levin
- Department of Internal Medicine B, Sheba Medical Centre, Tel Aviv, Israel
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
15
|
Sahin S, Erkmen B, Duymaz YK, Bayram F, Tekin AM, Topsakal V. Evaluating ChatGPT-4's performance as a digital health advisor for otosclerosis surgery. Front Surg 2024; 11:1373843. [PMID: 38903865 PMCID: PMC11188327 DOI: 10.3389/fsurg.2024.1373843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Accepted: 05/20/2024] [Indexed: 06/22/2024] Open
Abstract
Purpose This study aims to evaluate the effectiveness of ChatGPT-4, an artificial intelligence (AI) chatbot, in providing accurate and comprehensible information to patients regarding otosclerosis surgery. Methods On October 20, 2023, 15 hypothetical questions were posed to ChatGPT-4 to simulate physician-patient interactions about otosclerosis surgery. Responses were evaluated by three independent ENT specialists using the DISCERN scoring system. The readability was evaluated using multiple indices: Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (Gunning FOG), Simple Measure of Gobbledygook (SMOG), Coleman-Liau Index (CLI), and Automated Readability Index (ARI). Results The responses from ChatGPT-4 received DISCERN scores ranging from poor to excellent, with an overall score of 50.7 ± 8.2. The readability analysis indicated that the texts were above the 6th-grade level, suggesting they may not be easily comprehensible to the average reader. There was a significant positive correlation between the referees' scores. Despite providing correct information in over 90% of the cases, the study highlights concerns regarding the potential for incomplete or misleading answers and the high readability level of the responses. Conclusion While ChatGPT-4 shows potential in delivering health information accurately, its utility is limited by the level of readability of its responses. The study underscores the need for continuous improvement in AI systems to ensure the delivery of information that is both accurate and accessible to patients with varying levels of health literacy. Healthcare professionals should supervise the use of such technologies to enhance patient education and care.
Collapse
Affiliation(s)
| | | | - Yaşar Kemal Duymaz
- Umraniye Research and Training Hospital, University of Health Sciences, Istanbul, Türkiye
| | - Furkan Bayram
- Umraniye Research and Training Hospital, University of Health Sciences, Istanbul, Türkiye
| | - Ahmet Mahmut Tekin
- Department of Otolaryngology and Head & Neck Surgery, Vrije Universiteit Brussel, Brussels Health Care Center, Brussels, Belgium
| | - Vedat Topsakal
- Department of Otolaryngology and Head & Neck Surgery, Vrije Universiteit Brussel, Brussels Health Care Center, Brussels, Belgium
| |
Collapse
|
16
|
Menshawey R, Menshawey E. Quid Pro Quo Doctor, I tell you things, you tell me things: ChatGPT's thoughts on a killer. Forensic Sci Med Pathol 2024; 20:751-755. [PMID: 37594609 DOI: 10.1007/s12024-023-00696-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/07/2023] [Indexed: 08/19/2023]
Affiliation(s)
- Rahma Menshawey
- Kasr al Ainy Hospital, Faculty of Medicine, Kasr Al Ainy, Cairo University, Geziret Elroda, Manial, Cairo, 11562, Egypt.
| | - Esraa Menshawey
- Kasr al Ainy Hospital, Faculty of Medicine, Kasr Al Ainy, Cairo University, Geziret Elroda, Manial, Cairo, 11562, Egypt
| |
Collapse
|
17
|
Dallari V, Liberale C, De Cecco F, Nocini R, Arietti V, Monzani D, Sacchetto L. The role of artificial intelligence in training ENT residents: a survey on ChatGPT, a new method of investigation. ACTA OTORHINOLARYNGOLOGICA ITALICA : ORGANO UFFICIALE DELLA SOCIETA ITALIANA DI OTORINOLARINGOLOGIA E CHIRURGIA CERVICO-FACCIALE 2024; 44:161-168. [PMID: 38712520 PMCID: PMC11166211 DOI: 10.14639/0392-100x-n2806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 01/02/2024] [Indexed: 05/08/2024]
Abstract
Objective The primary focus of this study was to analyze the adoption of ChatGPT among Ear, Nose, and Throat (ENT) trainees, encompassing its role in scientific research and personal study. We examined in which year ENT trainees become involved in clinical research and how many scientific investigations they have been engaged in. Methods An online survey was distributed to ENT residents employed in Italian University Hospitals. Results Out of 609 Italian ENT trainees, 181 (29.7%) responded to the survey. Among these, 67.4% were familiar with ChatGPT, and 18.9% of them used artificial intelligence as a tool for research and study. In all, 32.6% were not familiar with ChatGPT and its functions. Within our sample, there was an increasing trend of participation by ENT trainees in scientific publications throughout their training. Conclusions ChatGPT remains relatively unfamiliar and underutilised in Italy, even though it could be a valuable and efficient tool for ENT trainees, providing quick access for study and research through both personal computers and smartphones.
Collapse
Affiliation(s)
- Virginia Dallari
- Unit of Otorhinolaryngology, Head & Neck Department, University of Verona, Verona, Italy
- Member of the Young Confederation of European ORL-HNS
| | - Carlotta Liberale
- Unit of Otorhinolaryngology, Head & Neck Department, University of Verona, Verona, Italy
| | - Francesca De Cecco
- Unit of Otorhinolaryngology, Head & Neck Department, University of Verona, Verona, Italy
| | - Riccardo Nocini
- Unit of Otorhinolaryngology, Head & Neck Department, University of Verona, Verona, Italy
- Member of the Young Confederation of European ORL-HNS
| | - Valerio Arietti
- Unit of Otorhinolaryngology, Head & Neck Department, University of Verona, Verona, Italy
| | - Daniele Monzani
- Unit of Otorhinolaryngology, Head & Neck Department, University of Verona, Verona, Italy
| | - Luca Sacchetto
- Unit of Otorhinolaryngology, Head & Neck Department, University of Verona, Verona, Italy
| |
Collapse
|
18
|
Kim H, Park H, Kang S, Kim J, Kim J, Jung J, Taira R. Evaluating the validity of the nursing statements algorithmically generated based on the International Classifications of Nursing Practice for respiratory nursing care using large language models. J Am Med Inform Assoc 2024; 31:1397-1403. [PMID: 38630586 PMCID: PMC11105147 DOI: 10.1093/jamia/ocae070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/21/2024] [Accepted: 03/19/2024] [Indexed: 04/19/2024] Open
Abstract
OBJECTIVE This study aims to facilitate the creation of quality standardized nursing statements in South Korea's hospitals using algorithmic generation based on the International Classifications of Nursing Practice (ICNP) and evaluation through Large Language Models. MATERIALS AND METHODS We algorithmically generated 15 972 statements related to acute respiratory care using 117 concepts and concept composition models of ICNP. Human reviewers, Generative Pre-trained Transformers 4.0 (GPT-4.0), and Bio_Clinical Bidirectional Encoder Representations from Transformers (BERT) evaluated the generated statements for validity. The evaluation by GPT-4.0 and Bio_ClinicalBERT was conducted with and without contextual information and training. RESULTS Of the generated statements, 2207 were deemed valid by expert reviewers. GPT-4.0 showed a zero-shot AUC of 0.857, which aggravated with contextual information. Bio_ClinicalBERT, after training, significantly improved, reaching an AUC of 0.998. CONCLUSION Bio_ClinicalBERT effectively validates auto-generated nursing statements, offering a promising solution to enhance and streamline healthcare documentation processes.
Collapse
Affiliation(s)
- Hyeoneui Kim
- College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
- The Research Institute of Nursing Science, Seoul National University, Seoul, 03080, Republic of Korea
- Center for Human-Caring Nurse Leaders for the Future by Brain Korea 21 (BK 21) Four Project, College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
| | - Hyewon Park
- College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
- Samsung Medical Center, Seoul, 06351, Republic of Korea
| | - Sunghoon Kang
- The Department of Science Studies, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jinsol Kim
- College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
- Center for Human-Caring Nurse Leaders for the Future by Brain Korea 21 (BK 21) Four Project, College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
| | - Jeongha Kim
- College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
- Asan Medical Center, Seoul, 05505, Republic of Korea
| | - Jinsun Jung
- College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
- Center for Human-Caring Nurse Leaders for the Future by Brain Korea 21 (BK 21) Four Project, College of Nursing, Seoul National University, Seoul, 03080, Republic of Korea
| | - Ricky Taira
- The Department of Radiological Science, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, United States
| |
Collapse
|
19
|
Lopez-Gonzalez R, Sanchez-Cordero S, Pujol-Gebellí J, Castellvi J. Evaluation of the Impact of ChatGPT on the Selection of Surgical Technique in Bariatric Surgery. Obes Surg 2024:10.1007/s11695-024-07279-1. [PMID: 38760650 DOI: 10.1007/s11695-024-07279-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 05/19/2024]
Abstract
PURPOSE With the growing interest in artificial intelligence (AI) applications in medicine, this study explores ChatGPT's potential to influence surgical technique selection in metabolic and bariatric surgery (MBS), contrasting AI recommendations with established clinical guidelines and expert consensus. MATERIALS AND METHODS Conducting a single-center retrospective analysis, the study involved 161 patients who underwent MBS between January 2022 and December 2023. ChatGPT4 was used to analyze patient data, including demographics, pathological history, and BMI, to recommend the most suitable surgical technique. These AI recommendations were then compared with the hospital's algorithm-based decisions. RESULTS ChatGPT recommended Roux-en-Y gastric bypass in over half of the cases. However, a significant difference was observed between AI suggestions and actual surgical techniques applied, with only a 34.16% match rate. Further analysis revealed any significant correlation between ChatGPT recommendations and the established surgical algorithm. CONCLUSION Despite ChatGPT's ability to process and analyze large datasets, its recommendations for MBS techniques do not align closely with those determined by expert surgical teams using a high success rate algorithm. Consequently, the study concludes that ChatGPT4 should not replace expert consultation in selecting MBS techniques.
Collapse
Affiliation(s)
- Ruth Lopez-Gonzalez
- General and Digestive Surgery, Moises Broggi University Hospital, C Oriol Martorell 12, 08970, Barcelona, Spain.
| | - Sergi Sanchez-Cordero
- General and Digestive Surgery, Moises Broggi University Hospital, C Oriol Martorell 12, 08970, Barcelona, Spain
| | - Jordi Pujol-Gebellí
- General and Digestive Surgery, Moises Broggi University Hospital, C Oriol Martorell 12, 08970, Barcelona, Spain
| | - Jordi Castellvi
- General and Digestive Surgery, Moises Broggi University Hospital, C Oriol Martorell 12, 08970, Barcelona, Spain
| |
Collapse
|
20
|
Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA, Bergonzani M, Boscolo-Rizzo P, Califano G, Cammaroto G, Chiesa-Estomba CM, Committeri U, Crimi S, Curran NR, di Bello F, di Stadio A, Frosolini A, Gabriele G, Gengler IM, Lonardi F, Maglitto F, Mayo-Yáñez M, Petrocelli M, Pucci R, Saibene AM, Saponaro G, Tel A, Trabalzini F, Trecca EMC, Vellone V, Salzano G, De Riu G. Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08710-0. [PMID: 38703195 DOI: 10.1007/s00405-024-08710-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 04/27/2024] [Indexed: 05/06/2024]
Abstract
BACKGROUND The widespread diffusion of Artificial Intelligence (AI) platforms is revolutionizing how health-related information is disseminated, thereby highlighting the need for tools to evaluate the quality of such information. This study aimed to propose and validate the Quality Assessment of Medical Artificial Intelligence (QAMAI), a tool specifically designed to assess the quality of health information provided by AI platforms. METHODS The QAMAI tool has been developed by a panel of experts following guidelines for the development of new questionnaires. A total of 30 responses from ChatGPT4, addressing patient queries, theoretical questions, and clinical head and neck surgery scenarios were assessed by 27 reviewers from 25 academic centers worldwide. Construct validity, internal consistency, inter-rater and test-retest reliability were assessed to validate the tool. RESULTS The validation was conducted on the basis of 792 assessments for the 30 responses given by ChatGPT4. The results of the exploratory factor analysis revealed a unidimensional structure of the QAMAI with a single factor comprising all the items that explained 51.1% of the variance with factor loadings ranging from 0.449 to 0.856. Overall internal consistency was high (Cronbach's alpha = 0.837). The Interclass Correlation Coefficient was 0.983 (95% CI 0.973-0.991; F (29,542) = 68.3; p < 0.001), indicating excellent reliability. Test-retest reliability analysis revealed a moderate-to-strong correlation with a Pearson's coefficient of 0.876 (95% CI 0.859-0.891; p < 0.001). CONCLUSIONS The QAMAI tool demonstrated significant reliability and validity in assessing the quality of health information provided by AI platforms. Such a tool might become particularly important/useful for physicians as patients increasingly seek medical information on AI platforms.
Collapse
Affiliation(s)
- Luigi Angelo Vaira
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Viale San Pietro 43/B, 07100, Sassari, Italy.
- PhD School of Biomedical Science, Biomedical Sciences Department, University of Sassari, Sassari, Italy.
| | - Jerome R Lechien
- Department of Laryngology and Bronchoesophagology, EpiCURA Hospital, Mons School of Medicine, UMONS. Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium
- Department of Otolaryngology-Head Neck Surgery, Elsan Polyclinic of Poitiers, Poitiers, France
| | - Vincenzo Abbate
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Fabiana Allevi
- Maxillofacial Surgery Department, ASSt Santi Paolo e Carlo, University of Milan, Milan, Italy
| | - Giovanni Audino
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Giada Anna Beltramini
- Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milan, Italy
- Maxillofacial and Dental Unit, Fondazione IRCCS Cà Granda Ospedale Maggiore Policlinico, Milan, Italy
| | - Michela Bergonzani
- Maxillo-Facial Surgery Division, Head and Neck Department, University Hospital of Parma, Parma, USA
| | - Paolo Boscolo-Rizzo
- Department of Medical, Surgical and Health Sciences, Section of Otolaryngology, University of Trieste, Trieste, Italy
| | - Gianluigi Califano
- Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Giovanni Cammaroto
- ENT Department, Morgagni Pierantoni Hospital, AUSL Romagna, Forlì, Italy
| | - Carlos M Chiesa-Estomba
- Department of Otorhinolaryngology-Head and Neck Surgery, Hospital Universitario Donostia, San Sebastian, Spain
| | - Umberto Committeri
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Salvatore Crimi
- Operative Unit of Maxillofacial Surgery, Policlinico San Marco, University of Catania, Catania, Italy
| | - Nicholas R Curran
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati Medical Center, Cincinnati, OH, USA
| | - Francesco di Bello
- Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Arianna di Stadio
- Otolaryngology Unit, GF Ingrassia Department, University of Catania, Catania, Italy
| | - Andrea Frosolini
- Department of Maxillofacial Surgery, University of Siena, Siena, Italy
| | - Guido Gabriele
- Department of Maxillofacial Surgery, University of Siena, Siena, Italy
| | - Isabelle M Gengler
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati Medical Center, Cincinnati, OH, USA
| | - Fabio Lonardi
- Department of Maxillofacial Surgery, University of Verona, Verona, Italy
| | - Fabio Maglitto
- Maxillo-Facial Surgery Unit, University of Bari "Aldo Moro", Bari, Italy
| | - Miguel Mayo-Yáñez
- Otorhinolaryngology, Head and Neck Surgery Department, Complexo Hospitalario Universitario A Coruña (CHUAC), A Coruña, Galicia, Spain
| | - Marzia Petrocelli
- Maxillofacial Surgery Operative Unit, Bellaria and Maggiore Hospital, Bologna, Italy
| | - Resi Pucci
- Maxillofacial Surgery Unit, San Camillo-Forlanini Hospital, Rome, Italy
| | - Alberto Maria Saibene
- Otolaryngology Unit, Santi Paolo e Carlo Hospital, Department of Health Sciences, University of Milan, Milan, Italy
| | - Gianmarco Saponaro
- Maxillo-Facial Surgery Unit, IRCSS "A. Gemelli" Foundation-Catholic University of the Sacred Heart, Rome, Italy
| | - Alessandro Tel
- Clinic of Maxillofacial Surgery, Department of Head and Neck Surgery and Neuroscience, University Hospital of Udine, Udine, Italy
| | - Franco Trabalzini
- Department of Otorhinolaryngology, Head and Neck Surgery, Meyer Children's Hospital, Florence, Italy
| | - Eleonora M C Trecca
- Department of Otorhinolaryngology and Maxillofacial Surgery, IRCCS Hospital Casa Sollievo Della Sofferenza, San Giovanni Rotondo, Foggia, Italy
- Department of Otorhinolaryngology, University Hospital of Foggia, Foggia, Italy
| | | | - Giovanni Salzano
- Head and Neck Section, Department of Neurosciences, Reproductive and Odontostomatological Science, Federico II University of Naples, Naples, Italy
| | - Giacomo De Riu
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Viale San Pietro 43/B, 07100, Sassari, Italy
| |
Collapse
|
21
|
Lechien JR, Carroll TL, Huston MN, Naunheim MR. ChatGPT-4 accuracy for patient education in laryngopharyngeal reflux. Eur Arch Otorhinolaryngol 2024; 281:2547-2552. [PMID: 38492008 DOI: 10.1007/s00405-024-08560-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 02/13/2024] [Indexed: 03/18/2024]
Abstract
INTRODUCTION Chatbot Generative Pre-trained Transformer (ChatGPT) is an artificial intelligence-powered language model chatbot able to help otolaryngologists in practice and research. The ability of ChatGPT in generating patient-centered information related to laryngopharyngeal reflux disease (LPRD) was evaluated. METHODS Twenty-five questions dedicated to definition, clinical presentation, diagnosis, and treatment of LPRD were developed from the Dubai definition and management of LPRD consensus and recent reviews. Questions about the four aforementioned categories were entered into ChatGPT-4. Four board-certified laryngologists evaluated the accuracy of ChatGPT-4 with a 5-point Likert scale. Interrater reliability was evaluated. RESULTS The mean scores (SD) of ChatGPT-4 answers for definition, clinical presentation, additional examination, and treatments were 4.13 (0.52), 4.50 (0.72), 3.75 (0.61), and 4.18 (0.47), respectively. Experts reported high interrater reliability for sub-scores (ICC = 0.973). The lowest performances of ChatGPT-4 were on answers about the most prevalent LPR signs, the most reliable objective tool for the diagnosis (hypopharyngeal-esophageal multichannel intraluminal impedance-pH monitoring (HEMII-pH)), and the criteria for the diagnosis of LPR using HEMII-pH. CONCLUSION ChatGPT-4 may provide adequate information on the definition of LPR, differences compared to GERD (gastroesophageal reflux disease), and clinical presentation. Information provided upon extra-laryngeal manifestations and HEMII-pH may need further optimization. Regarding the recent trends identifying increasing patient use of internet sources for self-education, the findings of the present study may help draw attention to ChatGPT-4's accuracy on the topic of LPR.
Collapse
Affiliation(s)
- Jerome R Lechien
- Research Committee, Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France.
- Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium.
- Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, School of Medicine, Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Paris, France.
- Polyclinique Elsan de Poitiers, Poitiers, France.
| | - Thomas L Carroll
- Division of Otolaryngology-Head and Neck Surgery, Brigham and Women's Hospital, Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA, USA
| | - Molly N Huston
- Department of Otolaryngology, Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Matthew R Naunheim
- Research Committee, Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France
- Department of Otolaryngology-Head and Neck Surgery, Harvard Medical School, Boston, MA, USA
- Division of Laryngology, Massachusetts Eye and Ear, Boston, MA, USA
| |
Collapse
|
22
|
Starke SJ, Martinez Rivera MB, Krishnan S, Shah M. Randomized Controlled Trial of Clinical Guidelines Versus Interactive Decision-Support for Improving Medical Trainees' Confidence with Latent Tuberculosis Care. J Gen Intern Med 2024; 39:951-959. [PMID: 38062221 PMCID: PMC11074081 DOI: 10.1007/s11606-023-08551-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/17/2023] [Indexed: 05/08/2024]
Abstract
BACKGROUND In order to eliminate tuberculosis (TB) in the USA, primary care providers must take on an expanded role in the diagnosis and management of latent tuberculosis infection (LTBI). Clinical practice guidelines and recommendations exist for LTBI management, but there is a need for innovative tools to improve medical students' and residents' knowledge of evidence-based practices for LTBI testing and treatment. OBJECTIVE To assess the impact of LTBI-ASSIST, a free online decision support aid, as a novel educational tool and mechanism of delivering clinical practice guidelines for medical trainees. DESIGN A single site, randomized controlled trial of trainees delivered by electronic survey. INTERVENTIONS Medical students and Internal Medicine residents at the Johns Hopkins University School of Medicine. PARTICIPANTS Participants were randomized in 1:1 ratio to receive the US clinical practice guidelines and recommendations for Latent TB management (control arm) or the guidelines plus an introduction to LTBI-ASSIST (LTBI-ASSIST arm) as they completed a case-based knowledge assessment and reported confidence with domains of LTBI care. MAIN MEASURES (1) Proportion of questions answered correctly on a case-based knowledge assessment; (2) change in reported confidence with domains of LTBI care. KEY RESULTS One hundred and thirty participants completed the knowledge assessment. Those randomized to receive the LTBI-ASSIST Tool performed better on the case-based knowledge assessment with a mean score of 75.9% (95% CI: 70.6-81.1), compared to 57.4% (52.8-62.0) in the group that received the guidelines only (p <0.001). Similarly, the LTBI-ASSIST group reported a higher change in confidence (measured as post-assessment confidence minus pre-assessment confidence), compared to the control group, in six of the seven domains of LTBI care. CONCLUSIONS LTBI-ASSIST can be an effective supplement to existing guidelines in educating medical trainees and helping providers find evidence-based, guideline-supported answers for questions encountered in clinical practice. TRIAL REGISTRATION NIH Clinical Trial Registry No. NCT05772065.
Collapse
Affiliation(s)
- Samuel J Starke
- Department of Medicine, Johns Hopkins University, Baltimore, MD, USA.
| | - Marina B Martinez Rivera
- Division of Infectious Diseases, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Sonya Krishnan
- Division of Infectious Diseases, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Maunank Shah
- Division of Infectious Diseases, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
23
|
Wu J, Ma Y, Wang J, Xiao M. The Application of ChatGPT in Medicine: A Scoping Review and Bibliometric Analysis. J Multidiscip Healthc 2024; 17:1681-1692. [PMID: 38650670 PMCID: PMC11034560 DOI: 10.2147/jmdh.s463128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
Purpose ChatGPT has a wide range of applications in the medical field. Therefore, this review aims to define the key issues and provide a comprehensive view of the literature based on the application of ChatGPT in medicine. Methods This scope follows Arksey and O'Malley's five-stage framework. A comprehensive literature search of publications (30 November 2022 to 16 August 2023) was conducted. Six databases were searched and relevant references were systematically catalogued. Attention was focused on the general characteristics of the articles, their fields of application, and the advantages and disadvantages of using ChatGPT. Descriptive statistics and narrative synthesis methods were used for data analysis. Results Of the 3426 studies, 247 met the criteria for inclusion in this review. The majority of articles (31.17%) were from the United States. Editorials (43.32%) ranked first, followed by experimental studys (11.74%). The potential applications of ChatGPT in medicine are varied, with the largest number of studies (45.75%) exploring clinical practice, including assisting with clinical decision support and providing disease information and medical advice. This was followed by medical education (27.13%) and scientific research (16.19%). Particularly noteworthy in the discipline statistics were radiology, surgery and dentistry at the top of the list. However, ChatGPT in medicine also faces issues of data privacy, inaccuracy and plagiarism. Conclusion The application of ChatGPT in medicine focuses on different disciplines and general application scenarios. ChatGPT has a paradoxical nature: it offers significant advantages, but at the same time raises great concerns about its application in healthcare settings. Therefore, it is imperative to develop theoretical frameworks that not only address its widespread use in healthcare but also facilitate a comprehensive assessment. In addition, these frameworks should contribute to the development of strict and effective guidelines and regulatory measures.
Collapse
Affiliation(s)
- Jie Wu
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Yingzhuo Ma
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Jun Wang
- Department of Nursing, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Mingzhao Xiao
- Department of Urology, the First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| |
Collapse
|
24
|
Moise A, Centomo-Bozzo A, Orishchak O, Alnoury MK, Daniel SJ. Can ChatGPT Replace an Otolaryngologist in Guiding Parents on Tonsillectomy? EAR, NOSE & THROAT JOURNAL 2024:1455613241230841. [PMID: 38563440 DOI: 10.1177/01455613241230841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open
Abstract
Background: ChatGPT is an artificial intelligence tool, which utilizes machine learning to analyze and generate human-like text. The user-friendly accessibility of this tool enables patients conveniently access medical information without intricate terminology challenges. The objective of this study was to assess the accuracy of ChatGPT in providing insights into indications and management of complications after tonsillectomy, a common pediatric otolaryngology procedure. Methods: The responses generated by ChatGPT were compared to the "Clinical practice guidelines: tonsillectomy in children-executive summary" developed by the American Academy of Otolaryngology-Head and Neck Surgery Foundation (AAO-HNSF). An assessment was carried out by presenting predetermined questions regarding indications and complications post tonsillectomy to ChatGPT, followed by a comparison of its responses with the established guideline by 2 otolaryngology experts. The responses of both parties were reviewed by the senior author. Results: A total of 16 responses generated by ChatGPT were assessed. After a comprehensive review, it was concluded that 15 out of 16 (93.8%) responses demonstrated a high degree of reliability and accuracy, closely adhering to the standard established by the AAO-HNSF guideline. Conclusion: The results validate the potential of using ChatGPT to enhance healthcare delivery making guidelines more accessible to patients while also emphasizing the importance of ensuring the provision of accurate and reliable medical advice to patients.
Collapse
Affiliation(s)
- Alexander Moise
- Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
| | - Adam Centomo-Bozzo
- Faculty of Dental Medicine and Oral Health Sciences, McGill University, Montreal, QC, Canada
| | - Ostap Orishchak
- Department of Pediatric Otolaryngology, Montreal Children's Hospital, Montreal, QC, Canada
| | - Mohammed K Alnoury
- Department of Otolaryngology-Head and Neck Surgery, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Sam J Daniel
- Department of Pediatric Otolaryngology, Montreal Children's Hospital, Montreal, QC, Canada
| |
Collapse
|
25
|
Saibene AM, Allevi F, Calvo-Henriquez C, Maniaci A, Mayo-Yáñez M, Paderno A, Vaira LA, Felisati G, Craig JR. Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation. Eur Arch Otorhinolaryngol 2024; 281:1835-1841. [PMID: 38189967 PMCID: PMC10943141 DOI: 10.1007/s00405-023-08372-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 11/22/2023] [Indexed: 01/09/2024]
Abstract
PURPOSE This study aimed to evaluate the utility of large language model (LLM) artificial intelligence tools, Chat Generative Pre-Trained Transformer (ChatGPT) versions 3.5 and 4, in managing complex otolaryngological clinical scenarios, specifically for the multidisciplinary management of odontogenic sinusitis (ODS). METHODS A prospective, structured multidisciplinary specialist evaluation was conducted using five ad hoc designed ODS-related clinical scenarios. LLM responses to these scenarios were critically reviewed by a multidisciplinary panel of eight specialist evaluators (2 ODS experts, 2 rhinologists, 2 general otolaryngologists, and 2 maxillofacial surgeons). Based on the level of disagreement from panel members, a Total Disagreement Score (TDS) was calculated for each LLM response, and TDS comparisons were made between ChatGPT3.5 and ChatGPT4, as well as between different evaluators. RESULTS While disagreement to some degree was demonstrated in 73/80 evaluator reviews of LLMs' responses, TDSs were significantly lower for ChatGPT4 compared to ChatGPT3.5. Highest TDSs were found in the case of complicated ODS with orbital abscess, presumably due to increased case complexity with dental, rhinologic, and orbital factors affecting diagnostic and therapeutic options. There were no statistically significant differences in TDSs between evaluators' specialties, though ODS experts and maxillofacial surgeons tended to assign higher TDSs. CONCLUSIONS LLMs like ChatGPT, especially newer versions, showed potential for complimenting evidence-based clinical decision-making, but substantial disagreement was still demonstrated between LLMs and clinical specialists across most case examples, suggesting they are not yet optimal in aiding clinical management decisions. Future studies will be important to analyze LLMs' performance as they evolve over time.
Collapse
Affiliation(s)
- Alberto Maria Saibene
- Otolaryngology Unit, Santi Paolo E Carlo Hospital, Department of Health Sciences, Università Degli Studi Di Milano, Milan, Italy.
| | - Fabiana Allevi
- Maxillofacial Surgery Unit, Santi Paolo E Carlo Hospital, Department of Health Sciences, Università Degli Studi Di Milano, Milan, Italy
| | - Christian Calvo-Henriquez
- Service of Otolaryngology, Rhinology Unit, Hospital Complex at the University of Santiago de Compostela, Santiago de Compostela, A Coruña, Spain
| | - Antonino Maniaci
- Department of Medical, Surgical Sciences and Advanced Technologies G.F. Ingrassia, University of Catania, Catania, Italy
| | - Miguel Mayo-Yáñez
- Otorhinolaryngology, Head and Neck Surgery Department, Complexo Hospitalario Universitario A Coruña (CHUAC), A Coruña, Galicia, Spain
| | - Alberto Paderno
- Department of Otorhinolaryngology, Head and Neck Surgery, University of Brescia, Brescia, Italy
| | - Luigi Angelo Vaira
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy
- Biomedical Science PhD School, Biomedical Science Department, University of Sassari, Sassari, Italy
| | - Giovanni Felisati
- Otolaryngology Unit, Santi Paolo E Carlo Hospital, Department of Health Sciences, Università Degli Studi Di Milano, Milan, Italy
| | - John R Craig
- Department of Otolaryngology-Head and Neck Surgery, Henry Ford Health, Detroit, MI, USA
| |
Collapse
|
26
|
Mira FA, Favier V, Dos Santos Sobreira Nunes H, de Castro JV, Carsuzaa F, Meccariello G, Vicini C, De Vito A, Lechien JR, Chiesa-Estomba C, Maniaci A, Iannella G, Rojas EP, Cornejo JB, Cammaroto G. Chat GPT for the management of obstructive sleep apnea: do we have a polar star? Eur Arch Otorhinolaryngol 2024; 281:2087-2093. [PMID: 37980605 DOI: 10.1007/s00405-023-08270-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 09/29/2023] [Indexed: 11/21/2023]
Abstract
PURPOSE This study explores the potential of the Chat-Generative Pre-Trained Transformer (Chat-GPT), a Large Language Model (LLM), in assisting healthcare professionals in the diagnosis of obstructive sleep apnea (OSA). It aims to assess the agreement between Chat-GPT's responses and those of expert otolaryngologists, shedding light on the role of AI-generated content in medical decision-making. METHODS A prospective, cross-sectional study was conducted, involving 350 otolaryngologists from 25 countries who responded to a specialized OSA survey. Chat-GPT was tasked with providing answers to the same survey questions. Responses were assessed by both super-experts and statistically analyzed for agreement. RESULTS The study revealed that Chat-GPT and expert responses shared a common answer in over 75% of cases for individual questions. However, the overall consensus was achieved in only four questions. Super-expert assessments showed a moderate agreement level, with Chat-GPT scoring slightly lower than experts. Statistically, Chat-GPT's responses differed significantly from experts' opinions (p = 0.0009). Sub-analysis revealed areas of improvement for Chat-GPT, particularly in questions where super-experts rated its responses lower than expert consensus. CONCLUSIONS Chat-GPT demonstrates potential as a valuable resource for OSA diagnosis, especially where access to specialists is limited. The study emphasizes the importance of AI-human collaboration, with Chat-GPT serving as a complementary tool rather than a replacement for medical professionals. This research contributes to the discourse in otolaryngology and encourages further exploration of AI-driven healthcare applications. While Chat-GPT exhibits a commendable level of consensus with expert responses, ongoing refinements in AI-based healthcare tools hold significant promise for the future of medicine, addressing the underdiagnosis and undertreatment of OSA and improving patient outcomes.
Collapse
Affiliation(s)
- Felipe Ahumada Mira
- ENT Department, Hospital of Linares, Linares, Chile
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Valentin Favier
- ENT Department, University Hospital of Montpellier, Montpellier, France
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Heloisa Dos Santos Sobreira Nunes
- ENT and Sleep Medicine Department, Nucleus of Otolaryngology, Head and Neck Surgery and Sleep Medicine of São Paulo, São Paulo, Brazil
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Joana Vaz de Castro
- ENT Department, Armed Forces Hospital, Lisbon, Portugal
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Florent Carsuzaa
- ENT Department, University Hospital of Poitiers, Poitiers, France
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Giuseppe Meccariello
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy
| | - Claudio Vicini
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy
| | - Andrea De Vito
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy
| | - Jerome R Lechien
- Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology and Head and Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons, Mons, Belgium
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Carlos Chiesa-Estomba
- Department of Otorhinolaryngology, Biodonostia Research Institute, Donostia University Hospital, Osakidetza, 20014, San Sebastian, Spain
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Antonino Maniaci
- Department of Medical and Surgical Sciences and Advanced Technologies "GF Ingrassia", ENT Section, University of Catania, Piazza Università 2, 95100, Catania, Italy
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | - Giannicola Iannella
- Department of 'Organi di Senso', University "Sapienza", Viale Dell'Università 33, 00185, Rome, Italy
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France
| | | | | | - Giovanni Cammaroto
- Head and Neck Department, ENT & Oral Surgery Unity, G.B. Morgagni, L. Pierantoni Hospital, Via Forlanini, 47121, Forlì, Italy.
- Young Otolaryngologists-International Federations of Oto-Rhinolaryngological Societies (YO-IFOS), Paris, France.
| |
Collapse
|
27
|
Lechien JR, Maniaci A, Gengler I, Hans S, Chiesa-Estomba CM, Vaira LA. Validity and reliability of an instrument evaluating the performance of intelligent chatbot: the Artificial Intelligence Performance Instrument (AIPI). Eur Arch Otorhinolaryngol 2024; 281:2063-2079. [PMID: 37698703 DOI: 10.1007/s00405-023-08219-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 08/30/2023] [Indexed: 09/13/2023]
Abstract
OBJECTIVES To evaluate the reliability and validity of the Artificial Intelligence Performance Instrument (AIPI). METHODS Medical records of patients consulting in otolaryngology were evaluated by physicians and ChatGPT for differential diagnosis, management, and treatment. The ChatGPT performance was rated twice using AIPI within a 7-day period to assess test-retest reliability. Internal consistency was evaluated using Cronbach's α. Internal validity was evaluated by comparing the AIPI scores of the clinical cases rated by ChatGPT and 2 blinded practitioners. Convergent validity was measured by comparing the AIPI score with a modified version of the Ottawa Clinical Assessment Tool (OCAT). Interrater reliability was assessed using Kendall's tau. RESULTS Forty-five patients completed the evaluations (28 females). The AIPI Cronbach's alpha analysis suggested an adequate internal consistency (α = 0.754). The test-retest reliability was moderate-to-strong for items and the total score of AIPI (rs = 0.486, p = 0.001). The mean AIPI score of the senior otolaryngologist was significantly higher compared to the score of ChatGPT, supporting adequate internal validity (p = 0.001). Convergent validity reported a moderate and significant correlation between AIPI and modified OCAT (rs = 0.319; p = 0.044). The interrater reliability reported significant positive concordance between both otolaryngologists for the patient feature, diagnostic, additional examination, and treatment subscores as well as for the AIPI total score. CONCLUSIONS AIPI is a valid and reliable instrument in assessing the performance of ChatGPT in ear, nose and throat conditions. Future studies are needed to investigate the usefulness of AIPI in medicine and surgery, and to evaluate the psychometric properties in these fields.
Collapse
Affiliation(s)
- Jerome R Lechien
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France.
- Young Confederation of the European Oto-Rhino-Laryngological Head and Neck Surgery Societies (Y-CEORLHNS), Dublin, Ireland.
- Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, EpiCURA Hospital, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium.
- Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, School of Medicine, UFR Simone Veil, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France.
- Department of Otorhinolaryngology and Head and Neck Surgery, CHU Saint-Pierre, Brussels, Belgium.
- Faculty of Medicine, Department of Human Anatomy and Experimental Oncology, UMONS Research Institute for Health Sciences and Technology, Avenue du Champ de Mars, 6, B7000, Mons, Belgium.
| | - Antonino Maniaci
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France
- Department of Medical, Surgical Sciences and Advanced Technologies G.F. Ingrassia, ENT Section, University of Catania, 95123, Catania, Italy
| | - Isabelle Gengler
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati Medical Center, Cincinnati, OH, USA
| | - Stephane Hans
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France
- Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Department of Otorhinolaryngology and Head and Neck Surgery, Foch Hospital, School of Medicine, UFR Simone Veil, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France
| | - Carlos M Chiesa-Estomba
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France
- Young Confederation of the European Oto-Rhino-Laryngological Head and Neck Surgery Societies (Y-CEORLHNS), Dublin, Ireland
- Department of Otorhinolaryngology - Head and Neck Surgery, Donostia University Hospital - Biodonostia Research Institute, St. Sebastian, Spain
| | - Luigi A Vaira
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Societies (IFOS), Paris, France
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy
- Biomedical Science Department, Biomedical Science PhD School, University of Sassari, Sassari, Italy
| |
Collapse
|
28
|
Karimov Z, Allahverdiyev I, Agayarov OY, Demir D, Almuradova E. ChatGPT vs UpToDate: comparative study of usefulness and reliability of Chatbot in common clinical presentations of otorhinolaryngology-head and neck surgery. Eur Arch Otorhinolaryngol 2024; 281:2145-2151. [PMID: 38217726 PMCID: PMC10942922 DOI: 10.1007/s00405-023-08423-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 12/18/2023] [Indexed: 01/15/2024]
Abstract
PURPOSE The usage of Chatbots as a kind of Artificial Intelligence in medicine is getting to increase in recent years. UpToDate® is another well-known search tool established on evidence-based knowledge and is used daily by doctors worldwide. In this study, we aimed to investigate the usefulness and reliability of ChatGPT compared to UpToDate in Otorhinolaryngology and Head and Neck Surgery (ORL-HNS). MATERIALS AND METHODS ChatGPT-3.5 and UpToDate were interrogated for the management of 25 common clinical case scenarios (13 males/12 females) recruited from literature considering the daily observation at the Department of Otorhinolaryngology of Ege University Faculty of Medicine. Scientific references for the management were requested for each clinical case. The accuracy of the references in the ChatGPT answers was assessed on a 0-2 scale and the usefulness of the ChatGPT and UpToDate answers was assessed with 1-3 scores by reviewers. UpToDate and ChatGPT 3.5 responses were compared. RESULTS ChatGPT did not give references in some questions in contrast to UpToDate. Information on the ChatGPT was limited to 2021. UpToDate supported the paper with subheadings, tables, figures, and algorithms. The mean accuracy score of references in ChatGPT answers was 0.25-weak/unrelated. The median (Q1-Q3) was 1.00 (1.25-2.00) for ChatGPT and 2.63 (2.75-3.00) for UpToDate, the difference was statistically significant (p < 0.001). UpToDate was observed more useful and reliable than ChatGPT. CONCLUSIONS ChatGPT has the potential to support the physicians to find out the information but our results suggest that ChatGPT needs to be improved to increase the usefulness and reliability of medical evidence-based knowledge.
Collapse
Affiliation(s)
- Ziya Karimov
- Medicine Program, Ege University Faculty of Medicine, 35100, Izmir, Türkiye.
| | - Irshad Allahverdiyev
- Medicine Program, Istanbul University, Istanbul Faculty of Medicine, Istanbul, Türkiye
| | - Ozlem Yagiz Agayarov
- Department of Otolaryngology-Head and Neck Surgery, Izmir Tepecik Education and Research Hospital, Health Sciences University, Izmir, Türkiye
| | - Dogukan Demir
- Department of Otolaryngology-Head and Neck Surgery, Izmir Tepecik Education and Research Hospital, Health Sciences University, Izmir, Türkiye
| | - Elvina Almuradova
- Department of Medical Oncology, Ege University Faculty of Medicine, Izmir, Türkiye
- Department of Oncology, Medicana International Hospital, Izmir, Türkiye
| |
Collapse
|
29
|
Teixeira-Marques F, Medeiros N, Nazaré F, Alves S, Lima N, Ribeiro L, Gama R, Oliveira P. Exploring the role of ChatGPT in clinical decision-making in otorhinolaryngology: a ChatGPT designed study. Eur Arch Otorhinolaryngol 2024; 281:2023-2030. [PMID: 38345613 DOI: 10.1007/s00405-024-08498-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 01/23/2024] [Indexed: 03/16/2024]
Abstract
PURPOSE Since the beginning of 2023, ChatGPT emerged as a hot topic in healthcare research. The potential to be a valuable tool in clinical practice is compelling, particularly in improving clinical decision support by helping physicians to make clinical decisions based on the best medical knowledge available. We aim to investigate ChatGPT's ability to identify, diagnose and manage patients with otorhinolaryngology-related symptoms. METHODS A prospective, cross-sectional study was designed based on an idea suggested by ChatGPT to assess the level of agreement between ChatGPT and five otorhinolaryngologists (ENTs) in 20 reality-inspired clinical cases. The clinical cases were presented to the chatbot on two different occasions (ChatGPT-1 and ChatGPT-2) to assess its temporal stability. RESULTS The mean score of ChatGPT-1 was 4.4 (SD 1.2; min 1, max 5) and of ChatGPT-2 was 4.15 (SD 1.3; min 1, max 5), while the ENTs mean score was 4.91 (SD 0.3; min 3, max 5). The Mann-Whitney U test revealed a statistically significant difference (p < 0.001) between both ChatGPT's and the ENTs's score. ChatGPT-1 and ChatGPT-2 gave different answers in five occasions. CONCLUSIONS Artificial intelligence will be an important instrument in clinical decision-making in the near future and ChatGPT is the most promising chatbot so far. Despite needing further development to be used with safety, there is room for improvement and potential to aid otorhinolaryngology residents and specialists in making the most correct decision for the patient.
Collapse
Affiliation(s)
- Francisco Teixeira-Marques
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal.
| | - Nuno Medeiros
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Francisco Nazaré
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Sandra Alves
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Nuno Lima
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Leandro Ribeiro
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Rita Gama
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| | - Pedro Oliveira
- Department of Otorhinolaryngology, Centro Hospitalar de Vila Nova de Gaia/Espinho, Gaia (Porto), Portugal
| |
Collapse
|
30
|
Lechien JR, Chiesa-Estomba CM, Baudouin R, Hans S. Accuracy of ChatGPT in head and neck oncological board decisions: preliminary findings. Eur Arch Otorhinolaryngol 2024; 281:2105-2114. [PMID: 37991498 DOI: 10.1007/s00405-023-08326-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 10/27/2023] [Indexed: 11/23/2023]
Abstract
OBJECTIVES To evaluate the ChatGPT-4 performance in oncological board decisions. METHODS Twenty medical records of patients with head and neck cancer were evaluated by ChatGPT-4 for additional examinations, management, and therapeutic approaches. The ChatGPT-4 propositions were assessed with the Artificial Intelligence Performance Instrument. The stability of ChatGPT-4 was evaluated through regenerated answers at 1-day interval. RESULTS ChatGPT-4 provided adequate explanations for cTNM staging in 19 cases (95%). ChatGPT-4 proposed a significant higher number of additional examinations than practitioners (72 versus 103; p = 0.001). ChatGPT-4 indications of endoscopy-biopsy, HPV research, ultrasonography, and PET-CT were consistent with the oncological board decisions. The therapeutic propositions of ChatGPT-4 were accurate in 13 cases (65%). Most additional examination and primary treatment propositions were consistent throughout regenerated response process. CONCLUSIONS ChatGPT-4 may be an adjunctive theoretical tool in oncological board simple decisions.
Collapse
Affiliation(s)
- Jerome R Lechien
- Research Committee of Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), Paris, France.
- Department of Otolaryngology-Head Neck Surgery, Foch Hospital, UFR Simone Veil, University Paris Saclay, Paris, France.
- Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Paris, France.
- Department of Otorhinolaryngology and Head and Neck Surgery, CHU Saint-Pierre, Brussels, Belgium.
- Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head and Neck Surgery, EpiCURA Hospital, Baudour, Belgium.
| | - Carlos-Miguel Chiesa-Estomba
- Research Committee of Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), Paris, France
- Department of Otorhinolaryngology-Head and Neck Surgery, Hospital Universitario Donostia, San Sebastian, Spain
| | - Robin Baudouin
- Research Committee of Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), Paris, France
- Department of Otolaryngology-Head Neck Surgery, Foch Hospital, UFR Simone Veil, University Paris Saclay, Paris, France
- Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Paris, France
| | - Stéphane Hans
- Research Committee of Young-Otolaryngologists of the International Federations of Oto-Rhino-Laryngological Societies (YO-IFOS), Paris, France
- Department of Otolaryngology-Head Neck Surgery, Foch Hospital, UFR Simone Veil, University Paris Saclay, Paris, France
- Phonetics and Phonology Laboratory (UMR 7018 CNRS, Université Sorbonne Nouvelle/Paris 3), Paris, France
| |
Collapse
|
31
|
Abou-Abdallah M, Dar T, Mahmudzade Y, Michaels J, Talwar R, Tornari C. The quality and readability of patient information provided by ChatGPT: can AI reliably explain common ENT operations? Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08598-w. [PMID: 38530460 DOI: 10.1007/s00405-024-08598-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 03/04/2024] [Indexed: 03/28/2024]
Abstract
PURPOSE Access to high-quality and comprehensible patient information is crucial. However, information provided by increasingly prevalent Artificial Intelligence tools has not been thoroughly investigated. This study assesses the quality and readability of information from ChatGPT regarding three index ENT operations: tonsillectomy, adenoidectomy, and grommets. METHODS We asked ChatGPT standard and simplified questions. Readability was calculated using Flesch-Kincaid Reading Ease Score (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI) and Simple Measure of Gobbledygook (SMOG) scores. We assessed quality using the DISCERN instrument and compared these with ENT UK patient leaflets. RESULTS ChatGPT readability was poor, with mean FRES of 38.9 and 55.1 pre- and post-simplification, respectively. Simplified information from ChatGPT was 43.6% more readable (FRES) but scored 11.6% lower for quality. ENT UK patient information readability and quality was consistently higher. CONCLUSIONS ChatGPT can simplify information at the expense of quality, resulting in shorter answers with important omissions. Limitations in knowledge and insight curb its reliability for healthcare information. Patients should use reputable sources from professional organisations alongside clear communication with their clinicians for well-informed consent and making decisions.
Collapse
Affiliation(s)
- Michel Abou-Abdallah
- Ear, Nose and Throat Department, Luton and Dunstable University Hospital, Lewsey Rd, Luton, LU4 0DZ, UK.
| | - Talib Dar
- Ear, Nose and Throat Department, Luton and Dunstable University Hospital, Lewsey Rd, Luton, LU4 0DZ, UK
| | - Yasamin Mahmudzade
- Foundation Programme, East and North Hertfordshire NHS Trust, Stevenage, UK
| | - Joshua Michaels
- Ear, Nose and Throat Department, Luton and Dunstable University Hospital, Lewsey Rd, Luton, LU4 0DZ, UK
| | - Rishi Talwar
- Ear, Nose and Throat Department, Luton and Dunstable University Hospital, Lewsey Rd, Luton, LU4 0DZ, UK
| | - Chrysostomos Tornari
- Ear, Nose and Throat Department, Luton and Dunstable University Hospital, Lewsey Rd, Luton, LU4 0DZ, UK
| |
Collapse
|
32
|
Briganti G. How ChatGPT works: a mini review. Eur Arch Otorhinolaryngol 2024; 281:1565-1569. [PMID: 37991499 DOI: 10.1007/s00405-023-08337-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 11/06/2023] [Indexed: 11/23/2023]
Abstract
OBJECTIVE This paper offers a mini-review of OpenAI's language model, ChatGPT, detailing its mechanisms, applications in healthcare, and comparisons with other large language models (LLMs). METHODS The underlying technology of ChatGPT is outlined, focusing on its neural network architecture, training process, and the role of key elements such as input embedding, encoder, decoder, attention mechanism, and output projection. The advancements in GPT-4, including its capacity for internet connection and the integration of plugins for enhanced functionality are discussed. RESULTS ChatGPT can generate creative, coherent, and contextually relevant sentences, making it a valuable tool in healthcare for patient engagement, medical education, and clinical decision support. Yet, like other LLMs, it has limitations, including a lack of common sense knowledge, a propensity for hallucination of facts, a restricted context window, and potential privacy concerns. CONCLUSION Despite the limitations, LLMs like ChatGPT offer transformative possibilities for healthcare. With ongoing research in model interpretability, common-sense reasoning, and handling of longer context windows, their potential is vast. It is crucial for healthcare professionals to remain informed about these technologies and consider their ethical integration into practice.
Collapse
Affiliation(s)
- Giovanni Briganti
- Chair of AI and Digital Medicine, Department of Neuroscience, Faculty of Medicine, University of Mons, Avenue du Champs de Mars 6, B7000, Mons, Belgium.
- Department of Clinical Science, Faculty of Medicine, University of Liège, Quartier Hôpital, Avenue Hippocrate 13, B4000, Liege, Belgium.
- Faculty of Medicine, Université Libre de Bruxelles, Route de Lennik 808, B1070, Brussels, Belgium.
| |
Collapse
|
33
|
Lombardo R, Cicione A, Santoro G, De Nunzio C. ChatGPT in prostate cancer: myth or reality? Prostate Cancer Prostatic Dis 2024; 27:9-10. [PMID: 37950022 DOI: 10.1038/s41391-023-00750-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 08/03/2023] [Accepted: 10/18/2023] [Indexed: 11/12/2023]
Affiliation(s)
| | - Antonio Cicione
- Ospedale Sant'Andrea, Sapienza University of Rome, Rome, Italy
| | | | | |
Collapse
|
34
|
Sarma G, Kashyap H, Medhi PP. ChatGPT in Head and Neck Oncology-Opportunities and Challenges. Indian J Otolaryngol Head Neck Surg 2024; 76:1425-1429. [PMID: 38440617 PMCID: PMC10908741 DOI: 10.1007/s12070-023-04201-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 08/28/2023] [Indexed: 03/06/2024] Open
Abstract
Head and neck oncology represents a complex and challenging field, encompassing the diagnosis, treatment and management of various malignancies affecting the intricate anatomical structures of the head and neck region. With advancements in artificial intelligence (AI), chatbot applications have emerged as a promising tool to revolutionize the field of Head and Neck oncology. ChatGPT is a cutting-edge language model developed by OpenAI that can help the oncologist in the clinic in scheduling appointments, establishing a clinical diagnosis, making a treatment plan and follow-up. ChatGPT also plays an essential role in telemedicine consultations, medical documentation, scientific writing and research. ChatGPT carries its inherent drawbacks too. ChatGPT raises significant ethical concerns related to authorship, accountability, transparency, bias, and the potential for misinformation. ChatGPT's training data is limited to September 2021; thus, regular updates are required to keep pace with the rapidly evolving medical research and advancements. Therefore, a judicial approach to using ChatGPT is of utmost importance. Head and Neck Oncologists can reap the maximum benefit of this technology in terms of patient care, education and research to improve clinical outcomes.
Collapse
Affiliation(s)
- Gautam Sarma
- Department of Radiation Oncology, All India Institute of Medical Sciences Guwahati, Changsari, Assam, 781101 India
| | - Hrishikesh Kashyap
- Department of Radiation Oncology, All India Institute of Medical Sciences Guwahati, Changsari, Assam, 781101 India
| | - Partha Pratim Medhi
- Department of Radiation Oncology, All India Institute of Medical Sciences Guwahati, Changsari, Assam, 781101 India
| |
Collapse
|
35
|
Dallari V, Sacchetto A, Saetti R, Calabrese L, Vittadello F, Gazzini L. Is artificial intelligence ready to replace specialist doctors entirely? ENT specialists vs ChatGPT: 1-0, ball at the center. Eur Arch Otorhinolaryngol 2024; 281:995-1023. [PMID: 37962570 DOI: 10.1007/s00405-023-08321-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023]
Abstract
PURPOSE The purpose of this study is to evaluate ChatGPT's responses to Ear, Nose and Throat (ENT) clinical cases and compare them with the responses of ENT specialists. METHODS We have hypothesized 10 scenarios, based on ENT daily experience, with the same primary symptom. We have constructed 20 clinical cases, 2 for each scenario. We described them to 3 ENT specialists and ChatGPT. The difficulty of the clinical cases was assessed by the 5 ENT authors of this article. The responses of ChatGPT were evaluated by the 5 ENT authors of this article for correctness and consistency with the responses of the 3 ENT experts. To verify the stability of ChatGPT's responses, we conducted the searches, always from the same account, for 5 consecutive days. RESULTS Among the 20 cases, 8 were rated as low complexity, 6 as moderate complexity and 6 as high complexity. The overall mean correctness and consistency score of ChatGPT responses was 3.80 (SD 1.02) and 2.89 (SD 1.24), respectively. We did not find a statistically significant difference in the average ChatGPT correctness and coherence score according to case complexity. The total intraclass correlation coefficient (ICC) for the stability of the correctness and consistency of ChatGPT was 0.763 (95% confidence interval [CI] 0.553-0.895) and 0.837 (95% CI 0.689-0.927), respectively. CONCLUSIONS Our results revealed the potential usefulness of ChatGPT in ENT diagnosis. The instability in responses and the inability to recognise certain clinical elements are its main limitations.
Collapse
Affiliation(s)
- Virginia Dallari
- Young Confederation of European ORL-HNS, Y-CEORL-HNS, Dublin, Ireland
- Unit of Otorhinolaryngology, Head & Neck Department, University of Verona, Piazzale L.A. Scuro 10, 37134, Verona, Italy
| | - Andrea Sacchetto
- Young Confederation of European ORL-HNS, Y-CEORL-HNS, Dublin, Ireland.
- Department of Otolaryngology, Ospedale San Bortolo, AULSS 8 Berica, Vicenza, Italy.
| | - Roberto Saetti
- Department of Otolaryngology, Ospedale San Bortolo, AULSS 8 Berica, Vicenza, Italy
| | - Luca Calabrese
- Department of Otorhinolaryngology-Head and Neck Surgery, Hospital of Bolzano (SABES-ASDAA), Teaching Hospital of Paracelsus Medical University (PMU), Bolzano-Bozen, Italy
| | | | - Luca Gazzini
- Young Confederation of European ORL-HNS, Y-CEORL-HNS, Dublin, Ireland
- Department of Otorhinolaryngology-Head and Neck Surgery, Hospital of Bolzano (SABES-ASDAA), Teaching Hospital of Paracelsus Medical University (PMU), Bolzano-Bozen, Italy
| |
Collapse
|
36
|
Lechien JR, Georgescu BM, Hans S, Chiesa-Estomba CM. ChatGPT performance in laryngology and head and neck surgery: a clinical case-series. Eur Arch Otorhinolaryngol 2024; 281:319-333. [PMID: 37874336 DOI: 10.1007/s00405-023-08282-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 10/06/2023] [Indexed: 10/25/2023]
Abstract
OBJECTIVES To study the performance of ChatGPT in the management of laryngology and head and neck (LHN) cases. METHODS History and clinical examination of patients consulting at the Otolaryngology-Head and Neck Surgery department were presented to ChatGPT, which was interrogated for differential diagnosis, management, and treatment. The ChatGPT performance was assessed by two blinded board-certified otolaryngologists using the following items of a composite score and the Ottawa Clinic Assessment Tool: differential diagnosis; additional examination; and treatment options. The complexity of clinical cases was evaluated with the Amsterdam Clinical Challenge Scale test. RESULTS Forty clinical cases were submitted to ChatGPT, accounting for 14 (35%), 12 (30%), and 14 (35%) easy, moderate and difficult cases, respectively. ChatGPT indicated a significant higher number of additional examinations compared to practitioners (p = 0.001). There was a significant agreement between practitioners and ChatGPT for the indication of some common examinations (audiometry, ultrasonography, biopsy, gastrointestinal endoscopy or videofluoroscopy). ChatGPT never indicated some important additional examinations (PET-CT, voice quality assessment, or impedance-pH monitoring). ChatGPT reported highest performance in the proposition of the primary (90%) or the most plausible differential diagnoses (65%), and the therapeutic options (60-68%). The ChatGPT performance in the indication of additional examinations was lowest. CONCLUSIONS ChatGPT is a promising adjunctive tool in LHN practice, providing extensive documentation about disease-related additional examinations, differential diagnoses, and treatments. The ChatGPT is more efficient in diagnosis and treatment, rather than in the selection of the most adequate additional examination.
Collapse
Affiliation(s)
- Jerome R Lechien
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Socities (IFOS), Paris, France.
- Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, UMONS Research Institute for Health Sciences and Technology, EpiCURA Hospital, University of Mons (UMons), Mons, Belgium.
- Department of Otorhinolaryngology and Head and Neck Surgery, School of Medicine, UFR Simone Veil, Foch Hospital, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France.
- Department of Otorhinolaryngology and Head and Neck Surgery, CHU Saint-Pierre, Brussels, Belgium.
- Polyclinique Elsan de Poitiers, Poitiers, France.
- Department of Human Anatomy and Experimental Oncology, Faculty of Medicine, UMONS Research Institute for Health Sciences and Technology, Avenue du Champ de Mars, 6, 7000, Mons, Belgium.
| | - Bianca M Georgescu
- Division of Laryngology and Broncho-Esophagology, Department of Otolaryngology-Head Neck Surgery, UMONS Research Institute for Health Sciences and Technology, EpiCURA Hospital, University of Mons (UMons), Mons, Belgium
| | - Stephane Hans
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Socities (IFOS), Paris, France
- Department of Otorhinolaryngology and Head and Neck Surgery, School of Medicine, UFR Simone Veil, Foch Hospital, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France
| | - Carlos M Chiesa-Estomba
- Research Committee of Young Otolaryngologists of the International Federation of Otorhinolaryngological Socities (IFOS), Paris, France
- Department of Otorhinolaryngology-Head & Neck Surgery, Donostia University Hospital-Biodonostia Research Institute, St. Sebastian, Spain
| |
Collapse
|
37
|
Frosolini A, Franz L, Benedetti S, Vaira LA, de Filippis C, Gennaro P, Marioni G, Gabriele G. Assessing the accuracy of ChatGPT references in head and neck and ENT disciplines. Eur Arch Otorhinolaryngol 2023; 280:5129-5133. [PMID: 37679532 DOI: 10.1007/s00405-023-08205-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 08/19/2023] [Indexed: 09/09/2023]
Abstract
PURPOSE ChatGPT has gained popularity as a web application since its release in 2022. While artificial intelligence (AI) systems' potential in scientific writing is widely discussed, their reliability in reviewing literature and providing accurate references remains unexplored. This study examines the reliability of references generated by ChatGPT language models in the Head and Neck field. METHODS Twenty clinical questions were generated across different Head and Neck disciplines, to prompt ChatGPT versions 3.5 and 4.0 to produce texts on the assigned topics. The generated references were categorized as "true," "erroneous," or "inexistent" based on congruence with existing records in scientific databases. RESULTS ChatGPT 4.0 outperformed version 3.5 in terms of reference reliability. However, both versions displayed a tendency to provide erroneous/non-existent references. CONCLUSIONS It is crucial to address this challenge to maintain the reliability of scientific literature. Journals and institutions should establish strategies and good-practice principles in the evolving landscape of AI-assisted scientific writing.
Collapse
Affiliation(s)
- Andrea Frosolini
- Department of Maxillo-Facial Surgery, Policlinico Le Scotte, University of Siena, Siena, Italy.
| | - Leonardo Franz
- Phoniatris and Audiology Unit, Department of Neuroscience DNS, University of Padova, Treviso, Italy
- Artificial Intelligence in Medicine and Innovation in Clinical Research and Methodology (PhD Program), Department of Clinical and Experimental Sciences, University of Brescia, Brescia, Italy
| | - Simone Benedetti
- Department of Maxillo-Facial Surgery, Policlinico Le Scotte, University of Siena, Siena, Italy
| | - Luigi Angelo Vaira
- Maxillofacial Surgery Operative Unit, Department of Medicine, Surgery and Pharmacy, University of Sassari, Sassari, Italy
- PhD School of Biomedical Sciences, Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Cosimo de Filippis
- Phoniatris and Audiology Unit, Department of Neuroscience DNS, University of Padova, Treviso, Italy
| | - Paolo Gennaro
- Department of Maxillo-Facial Surgery, Policlinico Le Scotte, University of Siena, Siena, Italy
| | - Gino Marioni
- Phoniatris and Audiology Unit, Department of Neuroscience DNS, University of Padova, Treviso, Italy
| | - Guido Gabriele
- Department of Maxillo-Facial Surgery, Policlinico Le Scotte, University of Siena, Siena, Italy
| |
Collapse
|
38
|
Pugliese G, Maccari A, Felisati E, Felisati G, Giudici L, Rapolla C, Pisani A, Saibene AM. Are artificial intelligence large language models a reliable tool for difficult differential diagnosis? An a posteriori analysis of a peculiar case of necrotizing otitis externa. Clin Case Rep 2023; 11:e7933. [PMID: 37736475 PMCID: PMC10509342 DOI: 10.1002/ccr3.7933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/30/2023] [Accepted: 09/12/2023] [Indexed: 09/23/2023] Open
Abstract
Key Clinical Message Large language models have made artificial intelligence readily available to the general public and potentially have a role in healthcare; however, their use in difficult differential diagnosis is still limited, as demonstrated by a case of necrotizing otitis externa. Abstract This case report presents a peculiar case of necrotizing otitis externa (NOE) with skull base involvement which proved diagnostically challenging. The initial patient presentation and the imaging performed on the 78-year-old patient suggested a neoplastic rhinopharyngeal lesion and only after several unsuccessful biopsies the patient was transferred to our unit. Upon re-evaluation of the clinical picture, a clinical hypothesis of NOE with skull base erosion was made and confirmed by identifying Pseudomonas aeruginosa in biopsy specimens of skull base bone and external auditory canal skin. Upon diagnosis confirmation, the patient was treated with culture-oriented long-term antibiotics with complete resolution of the disease. Given the complex clinical presentation, we chose to submit a posteriori this NOE case to two large language models (LLM) to test their ability to handle difficult differential diagnoses. LLMs are easily approachable artificial intelligence tools that enable human-like interaction with the user relying upon large information databases for analyzing queries. The LLMs of choice were ChatGPT-3 and ChatGPT-4 and they were requested to analyze the case being provided with only objective clinical and imaging data.
Collapse
Affiliation(s)
- Giorgia Pugliese
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Alberto Maccari
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Elena Felisati
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Giovanni Felisati
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Leonardo Giudici
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Chiara Rapolla
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Antonia Pisani
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| | - Alberto Maria Saibene
- Otolaryngology UnitSanti Paolo e Carlo HospitalMilanItaly
- Department of Health SciencesUniversità degli Studi di MilanoMilanItaly
| |
Collapse
|