1
|
Wawrzuta D, Napieralska A, Ludwikowska K, Jaruševičius L, Trofimoviča-Krasnorucka A, Rausis G, Szulc A, Pędziwiatr K, Poláchová K, Klejdysz J, Chojnacka M. Large language models for pretreatment education in pediatric radiation oncology: A comparative evaluation study. Clin Transl Radiat Oncol 2025; 51:100914. [PMID: 39867725 PMCID: PMC11762905 DOI: 10.1016/j.ctro.2025.100914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Accepted: 01/05/2025] [Indexed: 01/28/2025] Open
Abstract
Background and purpose Pediatric radiotherapy patients and their parents are usually aware of their need for radiotherapy early on, but they meet with a radiation oncologist later in their treatment. Consequently, they search for information online, often encountering unreliable sources. Large language models (LLMs) have the potential to serve as an educational pretreatment tool, providing reliable answers to their questions. We aimed to evaluate the responses provided by generative pre-trained transformers (GPT), the most popular subgroup of LLMs, to questions about pediatric radiation oncology. Materials and methods We collected pretreatment questions regarding radiotherapy from patients and parents. Responses were generated using GPT-3.5, GPT-4, and fine-tuned GPT-3.5, with fine-tuning based on pediatric radiotherapy guides from various institutions. Additionally, a radiation oncologist prepared answers to these questions. Finally, a multi-institutional group of nine pediatric radiotherapy experts conducted a blind review of responses, assessing reliability, concision, and comprehensibility. Results The radiation oncologist and GPT-4 provided the highest-quality responses, though GPT-4's answers were often excessively verbose. While fine-tuned GPT-3.5 generally outperformed basic GPT-3.5, it often provided overly simplistic answers. Inadequate responses were rare, occurring in 4% of GPT-generated responses across all models, primarily due to GPT-3.5 generating excessively long responses. Conclusions LLMs can be valuable tools for educating patients and their families before treatment in pediatric radiation oncology. Among them, only GPT-4 provides information of a quality comparable to that of a radiation oncologist, although it still occasionally generates poor-quality responses. GPT-3.5 models should be used cautiously, as they are more likely to produce inadequate answers to patient questions.
Collapse
Affiliation(s)
- Dominik Wawrzuta
- Department of Radiation Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, Wawelska 15B, 02-034 Warsaw, Poland
| | - Aleksandra Napieralska
- Radiotherapy Department, Maria Sklodowska-Curie National Research Institute of Oncology, Wybrzeże Armii Krajowej 15, 44-100 Gliwice, Poland
- Department of Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, Garncarska 11, 31-115 Cracow, Poland
- Faculty of Medicine & Health Sciences, Andrzej Frycz Modrzewski Krakow University, Gustawa Herlinga-Grudzińskiego 1, 30-705 Cracow, Poland
| | - Katarzyna Ludwikowska
- Department of Radiation Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, Wawelska 15B, 02-034 Warsaw, Poland
| | - Laimonas Jaruševičius
- Oncology Institute, Lithuanian University of Health Sciences, A. Mickevičiaus g. 9, LT-44307, Kaunas, Lithuania
| | - Anastasija Trofimoviča-Krasnorucka
- Department of Radiation Oncology, Riga East University Hospital, Hipokrāta iela 2, LV-1038 Riga, Latvia
- Department of Internal Diseases, Riga Stradiņš University, Dzirciema iela 16, LV-1007 Riga, Latvia
| | - Gints Rausis
- Department of Radiation Oncology, Riga East University Hospital, Hipokrāta iela 2, LV-1038 Riga, Latvia
| | - Agata Szulc
- Department of Radiation Oncology, Lower Silesian Center of Oncology, Pulmonology and Hematology, Hirszfelda 12, 53-413 Wroclaw, Poland
| | - Katarzyna Pędziwiatr
- Department of Radiation Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, Wawelska 15B, 02-034 Warsaw, Poland
| | - Kateřina Poláchová
- Department of Radiation Oncology, Masaryk Memorial Cancer Institute, Žlutý kopec 7, 656 53 Brno, Czech Republic
- Department of Radiation Oncology, Faculty of Medicine, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
| | - Justyna Klejdysz
- Department of Economics, Ludwig Maximilian University of Munich (LMU), Geschwister-Scholl-Platz 1, 80539 Munich, Germany
- ifo Institute, Poschinger Straße 5, 81679 Munich, Germany
| | - Marzanna Chojnacka
- Department of Radiation Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, Wawelska 15B, 02-034 Warsaw, Poland
| |
Collapse
|
2
|
Mo Y, Park HS, Jang J, Lee EK. Relative importance of "why" and "how" messages on medication behavior: Insights from construal level theory. PATIENT EDUCATION AND COUNSELING 2025; 132:108603. [PMID: 39667199 DOI: 10.1016/j.pec.2024.108603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 11/30/2024] [Accepted: 12/02/2024] [Indexed: 12/14/2024]
Abstract
OBJECTIVE This study assesses the impact of initial messaging strategies on medication behavior in newly diagnosed hypertension patients in a hypothetical context. Applying Construal Level Theory, this study evaluated which message type-low construal (focused on how, feasibility, and concrete) or high construal (focused on why, desirability, and abstract)-is more effective. METHODS An online quasi-experiment was performed with 1200 participants without hypertension aged 30-60. The participants were divided into two message groups, each receiving a hypothetical hypertension diagnosis during a health check-up and different medication messages tailored to construal levels. RESULTS Compared to "how" messages, "why" messages significantly improved message satisfaction (F1,1192 = 10.36, p = 0.001, ηp2 = 0.009, M (SE) = 5.25 (0.04) vs. 5.04 (0.04)) and adherence intentions (F1,1192 = 7.54, p = 0.006, ηp2 = 0.006, M (SE) = 4.83 (0.06) vs. 4.59 (0.06)). CONCLUSION In the hypothetical scenario, patients newly diagnosed with hypertension were found to be more responsive to "why" messages and perceived a distant psychological distance to medication. PRACTICE IMPLICATIONS To enhance adherence intentions and message satisfaction, healthcare professionals should emphasize the reasons and benefits of medication use for newly prescribed hypertension patients. Moreover, early-stage patient materials should prioritize "why" messages to improve adherence.
Collapse
Affiliation(s)
- Yeonhwa Mo
- School of Pharmacy, Sungkyunkwan University, Suwon, South Korea
| | - Hyun Soon Park
- Department of Media and Communication, Sungkyunkwan University, Seoul, South Korea
| | - Jieun Jang
- School of Pharmacy, Sungkyunkwan University, Suwon, South Korea
| | - Eui-Kyung Lee
- School of Pharmacy, Sungkyunkwan University, Suwon, South Korea.
| |
Collapse
|
3
|
Kar SK, Singh A. Author's Reply to Comments on "How Sensitive Are the Free AI-detector Tools in Detecting AI-generated Texts? A Comparison of Popular AI-detector Tools". Indian J Psychol Med 2025:02537176241312261. [PMID: 39839153 PMCID: PMC11744586 DOI: 10.1177/02537176241312261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/23/2025] Open
Affiliation(s)
- Sujita Kumar Kar
- Dept. of Psychiatry, King George’s Medical University, Lucknow, Uttar Pradesh, India
| | - Amit Singh
- Dept. of Psychiatry, King George’s Medical University, Lucknow, Uttar Pradesh, India
| |
Collapse
|
4
|
Rakauskas TR, Da Costa A, Moriconi C, Gill G, Kwong JW, Lee N. Evaluation of Chat Generative Pre-trained Transformer and Microsoft Copilot Performance on the American Society of Surgery of the Hand Self-Assessment Examinations. JOURNAL OF HAND SURGERY GLOBAL ONLINE 2025; 7:23-28. [PMID: 39991611 PMCID: PMC11846544 DOI: 10.1016/j.jhsg.2024.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Accepted: 10/02/2024] [Indexed: 02/25/2025] Open
Abstract
Purpose Artificial intelligence advancements have the potential to transform medical education and patient care. The increasing popularity of large language models has raised important questions regarding their accuracy and agreement with human users. The purpose of this study was to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT), versions 3.5 and 4, as well as Microsoft Copilot, which is powered by ChatGPT-4, on self-assessment examination questions for hand surgery and compare results between versions. Methods Input included 1,000 questions across 5 years (2015-2019) of self-assessment examinations provided by the American Society of Surgery of the Hand. The primary outcomes included correctness, the percentage concordance relative to other users, and whether an additional prompt was required. Secondary outcomes included accuracy according to question type and difficulty. Results All question formats including image-based questions were used for the analysis. ChatGPT-3.5 correctly answered 51.6% and ChatGPT-4 correctly answered 63.4%, which was a statistically significant difference. Microsoft Copilot correctly answered 59.9% and outperformed ChatGPT-3.5 but scored significantly lower than ChatGPT-4. However, ChatGPT-3.5 sided with an average of 72.2% users when correct and 62.1% when incorrect, compared to an average of 67.0% and 53.2% users, respectively, for ChatGPT-4. Microsoft Copilot sided with an average of 79.7% users when correct and 52.1% when incorrect. The highest scoring subject was Miscellaneous, and the lowest scoring subject was Neuromuscular in all versions. Conclusions In this study, ChatGPT-4 and Microsoft Copilot perform better on the hand surgery subspecialty examinations than did ChatGPT-3.5. Microsoft Copilot was more accurate than ChatGPT3.5 but less accurate than ChatGPT4. The ChatGPT-4 and Microsoft Copilot were able to "pass" the 2015-2019 American Society for Surgery of the Hand self-assessment examinations. Clinical Relevance While holding promise within medical education, caution should be used with large language models as more detailed evaluation of consistency is needed. Future studies should explore how these models perform across multiple trials and contexts to truly assess their reliability.
Collapse
Affiliation(s)
| | - Antonio Da Costa
- College of Medicine, Florida Atlantic University, Boca Raton, FL
| | | | - Gurnoor Gill
- College of Medicine, Florida Atlantic University, Boca Raton, FL
| | - Jeffrey W. Kwong
- Department of Orthopaedic Surgery, University of California San Francisco, San Francisco, CA
| | - Nicolas Lee
- Department of Orthopaedic Surgery, University of California San Francisco, San Francisco, CA
| |
Collapse
|
5
|
Fanelli F, Saleh M, Santamaria P, Zhurakivska K, Nibali L, Troiano G. Development and Comparative Evaluation of a Reinstructed GPT-4o Model Specialized in Periodontology. J Clin Periodontol 2024. [PMID: 39723544 DOI: 10.1111/jcpe.14101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 11/15/2024] [Accepted: 12/06/2024] [Indexed: 12/28/2024]
Abstract
BACKGROUND Artificial intelligence (AI) has the potential to enhance healthcare practices, including periodontology, by improving diagnostics, treatment planning and patient care. This study introduces 'PerioGPT', a specialized AI model designed to provide up-to-date periodontal knowledge using GPT-4o and a novel retrieval-augmented generation (RAG) system. METHODS PerioGPT was evaluated in two phases. First, its performance was compared against those of five other chatbots using 50 periodontal questions from specialists, followed by a validation with 71 questions from the 2023-2024 'In-Service Examination' of the American Academy of Periodontology (AAP). The second phase focused on assessing PerioGPT's generative capacity, specifically its ability to create complex and accurate periodontal questions. RESULTS PerioGPT outperformed other chatbots, achieving a higher accuracy rate (81.16%) and generating more complex and precise questions with a mean complexity score of 3.81 ± 0.965 and an accuracy score of 4.35 ± 0.898. These results demonstrate PerioGPT's potential as a leading tool for creating reliable clinical queries in periodontology. CONCLUSIONS This study underscores the transformative potential of AI in periodontology, illustrating that specialized models can offer significant advantages over general language models for both educational and clinical applications. The findings highlight that tailoring AI technologies to specific medical fields may improve performance and relevance.
Collapse
Affiliation(s)
- Francesco Fanelli
- Department of Clinical and Experimental Medicine, University of Foggia, Foggia, Italy
| | - Muhammad Saleh
- Department of Periodontics and Oral Medicine, University of Michigan School of Dentistry, Ann Arbor, Michigan, USA
| | - Pasquale Santamaria
- Centre for Host Microbiome Interactions, Faculty of Dentistry, Oral and Craniofacial Sciences, King's College London, London, UK
| | - Khrystyna Zhurakivska
- Department of Clinical and Experimental Medicine, University of Foggia, Foggia, Italy
| | - Luigi Nibali
- Centre for Host Microbiome Interactions, Faculty of Dentistry, Oral and Craniofacial Sciences, King's College London, London, UK
| | - Giuseppe Troiano
- Department of Clinical and Experimental Medicine, University of Foggia, Foggia, Italy
- Department of Medicine and Surgery, LUM University, Casamassima, Italy
| |
Collapse
|
6
|
Chung D, Sidhom K, Dhillon H, Bal DS, Fidel MG, Jawanda G, Patel P. Real-world utility of ChatGPT in pre-vasectomy counselling, a safe and efficient practice: a prospective single-centre clinical study. World J Urol 2024; 43:32. [PMID: 39673635 DOI: 10.1007/s00345-024-05385-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 11/15/2024] [Indexed: 12/16/2024] Open
Abstract
PURPOSE This study sought to assess if pre-vasectomy counselling with ChatGPT can safely streamline the consultation process by reducing visit times and increasing patient satisfaction. METHODS A single-institution randomized pilot study was conducted to evaluate the safety and efficacy of ChatGPT for pre-vasectomy counselling. All adult patients interested in undergoing a vasectomy were included. Unwillingness to provide consent or not having internet access constituted exclusion. Patients were randomized 1:1 to ChatGPT with standard in-person or in-person consultation without ChatGPT. Length of visit, number of questions asked, and a Likert scale questionnaire (on a scale of 10, with 10 being defined as great and 0 being defined as poor), were collected. Descriptive statistics and a comparative analysis were performed. RESULTS 18 patients were included with a mean age of 35.8 ± 5.4 (n = 9) in the intervention arm and 36.9 ± 7.4 (n = 9) in the control arm. Pre-vasectomy counselling with ChatGPT was associated with a higher provider perception of patient understanding of the procedure (8.8 ± 1.0 vs. 6.7 ± 2.8; p = 0.047) and a decreased length of in-person consultation (7.7 ± 2.3 min vs. 10.6 ± 3.4 min; p = 0.05). Quality of information provided by ChatGPT, ease of use, and overall experience were rated highly at 8.3 ± 1.9, 9.1 ± 1.5, and 8.6 ± 1.7, respectively. CONCLUSIONS ChatGPT for pre-vasectomy counselling improved the efficiency of consultations and the provider's perception of the patient's understanding of the procedure.
Collapse
Affiliation(s)
- David Chung
- Section of Urology, Department of Surgery, University of Manitoba, AD203-720 McDermot Avenue, Winnipeg, Manitoba, R3N 1B1, Canada.
| | - Karim Sidhom
- Section of Urology, Department of Surgery, University of Manitoba, AD203-720 McDermot Avenue, Winnipeg, Manitoba, R3N 1B1, Canada
| | | | - Dhiraj S Bal
- Max Rady College of Medicine, University of Manitoba, Winnipeg, MB, Canada
| | - Maximilian G Fidel
- Max Rady College of Medicine, University of Manitoba, Winnipeg, MB, Canada
| | - Gary Jawanda
- Manitoba Men's Health Clinic, Winnipeg, MB, Canada
| | - Premal Patel
- Section of Urology, Department of Surgery, University of Manitoba, AD203-720 McDermot Avenue, Winnipeg, Manitoba, R3N 1B1, Canada
- Manitoba Men's Health Clinic, Winnipeg, MB, Canada
| |
Collapse
|
7
|
Jaques A, Abdelghafour K, Perkins O, Nuttall H, Haidar O, Johal K. A Study of Orthopedic Patient Leaflets and Readability of AI-Generated Text in Foot and Ankle Surgery (SOLE-AI). Cureus 2024; 16:e75826. [PMID: 39822447 PMCID: PMC11737805 DOI: 10.7759/cureus.75826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2024] [Indexed: 01/19/2025] Open
Abstract
Introduction The internet age has broadened the horizons of modern medicine, and the ever-increasing scope of artificial intelligence (AI) has made information about healthcare, common pathologies, and available treatment options much more accessible to the wider population. Patient autonomy relies on clear, accurate, and user-friendly information to give informed consent to an intervention. Our paper aims to outline the quality, readability, and accuracy of readily available information produced by AI relating to common foot and ankle procedures. Materials and methods A retrospective qualitative analysis of procedure-specific information relating to three common foot and ankle orthopedic procedures: ankle arthroscopy, ankle arthrodesis/fusion, and a gastrocnemius lengthening procedure was undertaken. Patient information leaflets (PILs) created by The British Orthopaedic Foot and Ankle Society (BOFAS) were compared to ChatGPT responses for readability, quality, and accuracy of information. Four language tools were used to assess readability: the Flesch-Kincaid reading ease (FKRE) score, the Flesch-Kincaid grade level (FKGL), the Gunning fog score (GFS), and the simple measure of gobbledygook (SMOG) index. Quality and accuracy were determined by using the DISCERN tool by five independent assessors. Results PILs produced by AI had significantly lower FKRE scores when compared to BOFAS -40.4 (SD: ±7.69) compared to 91.9 (SD: ±2.24) (p ≤ 0.0001), indicating poor readability of AI-generated text. DISCERN scoring highlighted a statistically significant improvement in accuracy and quality of human-generated information across two PILs with a mean score of 55.06 compared to 46.8. FKGL scoring indicated that the required grade of students to understand AI responses was consistently higher than compared to information leaflets at 11.7 versus 1.1 (p ≤ 0.0001). The number of years spent in education required to understand the ChatGPT-produced PILs was significantly higher in both GFS (14.46 vs. 2.0 years) (p < 0.0001) and SMOG (11.0 vs. 3.06 years) (p < 0.0001). Conclusion Despite significant advances in the implementation of AI in surgery, AI-generated PILs for common foot and ankle surgical procedures currently lack sufficient quality, depth, and readability - this risks leaving patients misinformed regarding upcoming procedures. We conclude that information from trusted professional bodies should be used to complement a clinical consultation, as there currently lacks sufficient evidence to support the routine implementation of AI-generated information into the consent process.
Collapse
Affiliation(s)
| | - Karim Abdelghafour
- Trauma and Orthopedics, Lister Hospital, Stevenage, GBR
- Trauma and Orthopedics, Cairo University Hospitals, Cairo, EGY
| | | | - Helen Nuttall
- Trauma and Orthopedics, Lister Hospital, Stevenage, GBR
| | - Omar Haidar
- Vascular Surgery, Lister Hospital, Stevenage, GBR
- General Surgery, Imperial College Healthcare NHS Trust, London, GBR
| | | |
Collapse
|
8
|
Choi JY, Han E, Yoo TK. Application of ChatGPT-4 to oculomics: a cost-effective osteoporosis risk assessment to enhance management as a proof-of-principles model in 3PM. EPMA J 2024; 15:659-676. [PMID: 39635018 PMCID: PMC11612069 DOI: 10.1007/s13167-024-00378-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 08/20/2024] [Indexed: 12/07/2024]
Abstract
Background Oculomics is an emerging medical field that focuses on the study of the eye to detect and understand systemic diseases. ChatGPT-4 is a highly advanced AI model with multimodal capabilities, allowing it to process text and statistical data. Osteoporosis is a chronic condition presenting asymptomatically but leading to fractures if untreated. Current diagnostic methods like dual X-ray absorptiometry (DXA) are costly and involve radiation exposure. This study aims to develop a cost-effective osteoporosis risk prediction tool using ophthalmological data and ChatGPT-4 based on oculomics, aligning with predictive, preventive, and personalized medicine (3PM) principles. Working hypothesis and methods We hypothesize that leveraging ophthalmological data (oculomics) combined with AI-driven regression models developed by ChatGPT-4 can significantly improve the predictive accuracy for osteoporosis risk. This integration will facilitate earlier detection, enable more effective preventive strategies, and support personalized treatment plans tailored to individual patients. We utilized DXA and ophthalmological data from the Korea National Health and Nutrition Examination Survey to develop and validate osteopenia and osteoporosis prediction models. Ophthalmological and demographic data were integrated into logistic regression analyses, facilitated by ChatGPT-4, to create prediction formulas. These models were then converted into calculator software through automated coding by ChatGPT-4. Results ChatGPT-4 automatically developed prediction models based on key predictors of osteoporosis and osteopenia included age, gender, weight, and specific ophthalmological conditions such as cataracts and early age-related macular degeneration, and successfully implemented a risk calculator tool. The oculomics-based models outperformed traditional methods, with area under the curve of the receiver operating characteristic values of 0.785 for osteopenia and 0.866 for osteoporosis in the validation set. The calculator demonstrated high sensitivity and specificity, providing a reliable tool for early osteoporosis screening. Conclusions and expert recommendations in the framework of 3PM This study illustrates the value of integrating ophthalmological data into multi-level diagnostics for osteoporosis, significantly improving the accuracy of health risk assessment and the identification of at-risk individuals. Aligned with the principles of 3PM, this approach fosters earlier detection and enables the development of individualized patient profiles, facilitating personalized and targeted treatment strategies. This study also highlights the potential of AI, specifically ChatGPT-4, in developing accessible, cost-effective, and radiation-free screening tools for advancing 3PM in clinical practice. Our findings emphasize the importance of a holistic approach, incorporating comprehensive health indices and interdisciplinary collaboration, to deliver personalized management plans. Preventive strategies should focus on lifestyle modifications and targeted interventions to enhance bone health, thereby preventing the progression of osteoporosis and contributing to overall patient well-being. Supplementary Information The online version contains supplementary material available at 10.1007/s13167-024-00378-0.
Collapse
Affiliation(s)
- Joon Yul Choi
- Department of Biomedical Engineering, Yonsei University, Wonju, South Korea
| | - Eoksoo Han
- Electronics and Telecommunications Research Institute (ETRI), Daejeon, South Korea
| | - Tae Keun Yoo
- Department of Ophthalmology, Hangil Eye Hospital, 35 Bupyeong-Daero, Bupyeong-Gu, Incheon, 21388 South Korea
| |
Collapse
|
9
|
Lack BT, Mouhawasse E, Childers JT, Jackson GR, Daji SV, Yerke-Hansen P, Familiari F, Knapik DM, Sabesan VJ. Can ChatGPT answer patient questions regarding reverse shoulder arthroplasty? J ISAKOS 2024; 9:100323. [PMID: 39307189 DOI: 10.1016/j.jisako.2024.100323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Revised: 09/05/2024] [Accepted: 09/16/2024] [Indexed: 10/20/2024]
Abstract
INTRODUCTION In recent years, artificial intelligence (AI) has seen substantial progress in its utilization, with Chat Generated Pre-Trained Transformer (ChatGPT) is emerging as a popular language model. The purpose of this study was to test the accuracy and reliability of ChatGPT's responses to frequently asked questions (FAQ) pertaining to reverse shoulder arthroplasty (RSA). METHODS The ten most common FAQs were queried from institution patient education websites. These ten questions were then input into the chatbot during a single session without additional contextual information. The responses were then critically analyzed by two orthopedic surgeons for clarity, accuracy, and the quality of evidence-based information using The Journal of the American Medical Association (JAMA) Benchmark criteria and the DISCERN score. The readability of the responses was analyzed using the Flesch-Kincaid Grade Level. RESULTS In response to the ten questions, the average DISCERN score was 44 (range 38-51). Seven responses were classified as fair and three were poor. The JAMA Benchmark criteria score was 0 for all responses. Furthermore, the average Flesch-Kincaid Grade Level was 14.35, which correlates to a college graduate reading level. CONCLUSION Overall, ChatGPT was able to provide fair responses to common patient questions. However, the responses were all written at a college graduate reading level and lacked reliable citations. The readability greatly limits its utility. Thus, adequate patient education should be done by orthopedic surgeons. This study underscores the need for patient education resources that are reliable, accessible, and comprehensible. LEVEL OF EVIDENCE IV.
Collapse
Affiliation(s)
- Benjamin T Lack
- Charles E. Schmidt Florida Atlantic University College of Medicine, Boca Raton, FL, USA
| | - Edwin Mouhawasse
- Charles E. Schmidt Florida Atlantic University College of Medicine, Boca Raton, FL, USA
| | - Justin T Childers
- Charles E. Schmidt Florida Atlantic University College of Medicine, Boca Raton, FL, USA
| | - Garrett R Jackson
- Department of Orthopaedic Surgery, University of Missouri, Columbia, MO 65212, USA.
| | - Shay V Daji
- JFK/University of Miami Department of Orthopedic Surgery, Palm Beach, FL, USA
| | - Payton Yerke-Hansen
- Department of Orthopaedic Surgery, Louisiana State University Health -Shreveport, Shreveport, LA 71103, USA
| | - Filippo Familiari
- Department of Orthopaedic and Trauma Surgery, Magna Graecia University, 88100 Catanzaro, Italy; Research Center on Musculoskeletal Health, Magna Graecia University, 88100 Catanzaro, Italy
| | - Derrick M Knapik
- Department of Orthopaedic Surgery, Washington University and Barnes-Jewish Orthopedic Center, Chesterfield, MO, USA
| | - Vani J Sabesan
- JFK/University of Miami Department of Orthopedic Surgery, Palm Beach, FL, USA
| |
Collapse
|
10
|
Ding Z, Wei R, Xia J, Mu Y, Wang J, Lin Y. Exploring the potential of large language model-based chatbots in challenges of ribosome profiling data analysis: a review. Brief Bioinform 2024; 26:bbae641. [PMID: 39668339 PMCID: PMC11638007 DOI: 10.1093/bib/bbae641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 11/02/2024] [Accepted: 11/27/2024] [Indexed: 12/14/2024] Open
Abstract
Ribosome profiling (Ribo-seq) provides transcriptome-wide insights into protein synthesis dynamics, yet its analysis poses challenges, particularly for nonbioinformatics researchers. Large language model-based chatbots offer promising solutions by leveraging natural language processing. This review explores their convergence, highlighting opportunities for synergy. We discuss challenges in Ribo-seq analysis and how chatbots mitigate them, facilitating scientific discovery. Through case studies, we illustrate chatbots' potential contributions, including data analysis and result interpretation. Despite the absence of applied examples, existing software underscores the value of chatbots and the large language model. We anticipate their pivotal role in future Ribo-seq analysis, overcoming limitations. Challenges such as model bias and data privacy require attention, but emerging trends offer promise. The integration of large language models and Ribo-seq analysis holds immense potential for advancing translational regulation and gene expression understanding.
Collapse
Affiliation(s)
- Zheyu Ding
- School of Pharmacy, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
| | - Rong Wei
- School of Pharmacy, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
| | - Jianing Xia
- School of Pharmacy, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
| | - Yonghao Mu
- School of Pharmacy, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
| | - Jiahuan Wang
- School of Pharmacy, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
| | - Yingying Lin
- School of Pharmacy, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou, Zhejiang 311121, China
| |
Collapse
|
11
|
Zheng J, Ding X, Pu JJ, Chung SM, Ai QYH, Hung KF, Shan Z. Unlocking the Potentials of Large Language Models in Orthodontics: A Scoping Review. Bioengineering (Basel) 2024; 11:1145. [PMID: 39593805 PMCID: PMC11591942 DOI: 10.3390/bioengineering11111145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 10/30/2024] [Accepted: 11/11/2024] [Indexed: 11/28/2024] Open
Abstract
(1) Background: In recent years, large language models (LLMs) such as ChatGPT have gained significant attention in various fields, including dentistry. This scoping review aims to examine the current applications and explore potential uses of LLMs in the orthodontic domain, shedding light on how they might improve dental healthcare. (2) Methods: We carried out a comprehensive search in five electronic databases, namely PubMed, Scopus, Embase, ProQuest and Web of Science. Two authors independently screened articles and performed data extraction according to the eligibility criteria, following the PRISMA-ScR guideline. The main findings from the included articles were synthesized and analyzed in a narrative way. (3) Results: A total of 706 articles were searched, and 12 papers were eventually included. The applications of LLMs include improving diagnostic and treatment efficiency in orthodontics as well as enhancing communication with patients. (4) Conclusions: There is emerging research in countries worldwide on the use of LLMs in orthodontics, suggesting an upward trend in their acceptance within this field. However, the potential application of LLMs remains in its early stage, with a noticeable lack of extensive studies and tailored products to address specific clinical needs.
Collapse
Affiliation(s)
- Jie Zheng
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China;
| | - Xiaoqian Ding
- Division of Paediatric Dentistry and Orthodontics, Faculty of Dentistry, The University of Hong Kong, Hong Kong, China; (X.D.); (S.M.C.)
| | - Jingya Jane Pu
- Division of Oral and Maxillofacial Surgery, Faculty of Dentistry, The University of Hong Kong, Hong Kong, China;
| | - Sze Man Chung
- Division of Paediatric Dentistry and Orthodontics, Faculty of Dentistry, The University of Hong Kong, Hong Kong, China; (X.D.); (S.M.C.)
| | - Qi Yong H. Ai
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China;
| | - Kuo Feng Hung
- Applied Oral Science & Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Hong Kong, China
| | - Zhiyi Shan
- Division of Paediatric Dentistry and Orthodontics, Faculty of Dentistry, The University of Hong Kong, Hong Kong, China; (X.D.); (S.M.C.)
| |
Collapse
|
12
|
Alsadhan AA. Assessing ChatGPT's cybersecurity implications in Saudi Arabian healthcare and education sectors: A comparative study. Nutr Health 2024:2601060241289975. [PMID: 39506281 DOI: 10.1177/02601060241289975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2024]
Abstract
STUDY PURPOSE This study aims to critically evaluate ChatGPT's impact on cybersecurity in healthcare and education sectors. METHODS This study employed a cross-sectional survey design, collecting data from healthcare and educational professionals in Saudi Arabia through a structured questionnaire, with 205 healthcare workers' and 214 educators. The survey assessed perceptions of ChatGPT's impact on cybersecurity opportunities and challenges, with data analyzed using descriptive statistics and ANOVA to explore differences across professional roles. RESULTS Healthcare professionals viewed artificial intelligence (AI) more favorably (mean scores 4.24 and 4.14) than those in education, who showed moderate enthusiasm (mean scores 2.55 to 3.54). Concerns over data privacy and the cost of securing AI were significant, with high mean scores of 3.59 indicating widespread apprehension. CONCLUSION A balanced approach to ChatGPT's integration that carefully considers ethical implications, data privacy, and the technology's dual-use potential is required.
Collapse
Affiliation(s)
- Abeer Abdullah Alsadhan
- Computer Science Department, Applied Collage, Imam Abdulrahman bin Faisal University, Dammam, Saudi Arabia
| |
Collapse
|
13
|
Grimm DR, Lee YJ, Hu K, Liu L, Garcia O, Balakrishnan K, Ayoub NF. The utility of ChatGPT as a generative medical translator. Eur Arch Otorhinolaryngol 2024; 281:6161-6165. [PMID: 38705894 DOI: 10.1007/s00405-024-08708-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 04/24/2024] [Indexed: 05/07/2024]
Abstract
PURPOSE Large language models continue to dramatically change the medical landscape. We aimed to explore the utility of ChatGPT in providing accurate, actionable, and understandable generative medical translations in English, Spanish, and Mandarin pertaining to Otolaryngology. METHODS Responses of GPT-4 to commonly asked patient questions listed on official otolaryngology clinical practice guidelines (CPG) were evaluated with the Patient Education materials Assessment Tool-printable (PEMAT-P.) Additional critical elements were identified a priori to evaluate ChatGPT's accuracy and thoroughness in its responses. Multiple fluent speakers of English, Mandarin, and Spanish evaluated each response generated by ChatGPT. RESULTS Total PEMAT-P scores differed between English, Mandarin, and Spanish GPT-4 generated responses depicting a moderate effect size of language, Eta-Square 0.07 with scores ranging from 73 to 77 (P-value = 0.03). Overall understandability scores did not differ between English, Mandarin, and Spanish depicting a small effect size of language, Eta-Square 0.02 scores ranging from 76 to 79 (P-value = 0.17), nor did overall actionability scores Eta-Square 0 score ranging 66-73 (P-value = 0.44). Overall a priori procedure-specific responses similarly did not differ between English, Spanish, and Mandarin Eta-Square 0.02 scores ranging 61-78 (P-value = 0.22). CONCLUSION GPT-4 produces accurate, understandable, and actionable outputs in English, Spanish, and Mandarin. Responses generated by GPT-4 in Spanish and Mandarin are comparable to English counterparts indicating a novel use for these models within Otolaryngology, and implications for bridging healthcare access and literacy gaps. LEVEL OF EVIDENCE IV.
Collapse
Affiliation(s)
- David R Grimm
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Yu-Jin Lee
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Katherine Hu
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Longsha Liu
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Omar Garcia
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Karthik Balakrishnan
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Noel F Ayoub
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA.
- Division of Rhinology and Skull Base Surgery, Department of Otolaryngology-Head and Neck Surgery, Mass Eye and Ear, 243 Charles Street, Boston, MA, 02114, USA.
| |
Collapse
|
14
|
Ourang SA, Sohrabniya F, Mohammad-Rahimi H, Dianat O, Aminoshariae A, Nagendrababu V, Dummer PMH, Duncan HF, Nosrat A. Artificial intelligence in endodontics: Fundamental principles, workflow, and tasks. Int Endod J 2024; 57:1546-1565. [PMID: 39056554 DOI: 10.1111/iej.14127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 06/25/2024] [Accepted: 07/13/2024] [Indexed: 07/28/2024]
Abstract
The integration of artificial intelligence (AI) in healthcare has seen significant advancements, particularly in areas requiring image interpretation. Endodontics, a specialty within dentistry, stands to benefit immensely from AI applications, especially in interpreting radiographic images. However, there is a knowledge gap among endodontists regarding the fundamentals of machine learning and deep learning, hindering the full utilization of AI in this field. This narrative review aims to: (A) elaborate on the basic principles of machine learning and deep learning and present the basics of neural network architectures; (B) explain the workflow for developing AI solutions, from data collection through clinical integration; (C) discuss specific AI tasks and applications relevant to endodontic diagnosis and treatment. The article shows that AI offers diverse practical applications in endodontics. Computer vision methods help analyse images while natural language processing extracts insights from text. With robust validation, these techniques can enhance diagnosis, treatment planning, education, and patient care. In conclusion, AI holds significant potential to benefit endodontic research, practice, and education. Successful integration requires an evolving partnership between clinicians, computer scientists, and industry.
Collapse
Affiliation(s)
- Seyed AmirHossein Ourang
- Dentofacial Deformities Research Center, Research Institute of Dental Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Fatemeh Sohrabniya
- Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany
| | - Hossein Mohammad-Rahimi
- Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Berlin, Germany
| | - Omid Dianat
- Division of Endodontics, Department of Advanced Oral Sciences and Therapeutics, University of Maryland School of Dentistry, Baltimore, Maryland, USA
- Private Practice, Irvine Endodontics, Irvine, California, USA
| | - Anita Aminoshariae
- Department of Endodontics, School of Dental Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | | | | | - Henry F Duncan
- Division of Restorative Dentistry, Dublin Dental University Hospital, Trinity College Dublin, Dublin, Ireland
| | - Ali Nosrat
- Division of Endodontics, Department of Advanced Oral Sciences and Therapeutics, University of Maryland School of Dentistry, Baltimore, Maryland, USA
- Private Practice, Centreville Endodontics, Centreville, Virginia, USA
| |
Collapse
|
15
|
Gur T, Hameiri B, Maaravi Y. Political ideology shapes support for the use of AI in policy-making. Front Artif Intell 2024; 7:1447171. [PMID: 39540200 PMCID: PMC11557559 DOI: 10.3389/frai.2024.1447171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 10/14/2024] [Indexed: 11/16/2024] Open
Abstract
In a world grappling with technological advancements, the concept of Artificial Intelligence (AI) in governance is becoming increasingly realistic. While some may find this possibility incredibly alluring, others may see it as dystopian. Society must account for these varied opinions when implementing new technologies or regulating and limiting them. This study (N = 703) explored Leftists' (liberals) and Rightists' (conservatives) support for using AI in governance decision-making amidst an unprecedented political crisis that washed through Israel shortly after the proclamation of the government's intentions to initiate reform. Results indicate that Leftists are more favorable toward AI in governance. While legitimacy is tied to support for using AI in governance among both, Rightists' acceptance is also tied to perceived norms, whereas Leftists' approval is linked to perceived utility, political efficacy, and warmth. Understanding these ideological differences is crucial, both theoretically and for practical policy formulation regarding AI's integration into governance.
Collapse
Affiliation(s)
- Tamar Gur
- Adelson School of Entrepreneurship, Reichman University, Herzliya, Israel
| | - Boaz Hameiri
- The School of Social and Policy Studies, Tel Aviv University, Tel Aviv, Israel
| | - Yossi Maaravi
- Adelson School of Entrepreneurship, Reichman University, Herzliya, Israel
| |
Collapse
|
16
|
Shamil E, Ko TK, Fan KS, Schuster-Bruce J, Jaafar M, Khwaja S, Eynon-Lewis N, D'Souza A, Andrews P. Assessing the Quality and Readability of Online Patient Information: ENT UK Patient Information e-Leaflets versus Responses by a Generative Artificial Intelligence. Facial Plast Surg 2024. [PMID: 39260421 DOI: 10.1055/a-2413-3675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2024] Open
Abstract
BACKGROUND The evolution of artificial intelligence has introduced new ways to disseminate health information, including natural language processing models like ChatGPT. However, the quality and readability of such digitally generated information remains understudied. This study is the first to compare the quality and readability of digitally generated health information against leaflets produced by professionals. METHODOLOGY Patient information leaflets from five ENT UK leaflets and their corresponding ChatGPT responses were extracted from the Internet. Assessors with various degrees of medical knowledge evaluated the content using the Ensuring Quality Information for Patients (EQIP) tool and readability tools including the Flesch-Kincaid Grade Level (FKGL). Statistical analysis was performed to identify differences between leaflets, assessors, and sources of information. RESULTS ENT UK leaflets were of moderate quality, scoring a median EQIP of 23. Statistically significant differences in overall EQIP score were identified between ENT UK leaflets, but ChatGPT responses were of uniform quality. Nonspecialist doctors rated the highest EQIP scores, while medical students scored the lowest. The mean readability of ENT UK leaflets was higher than ChatGPT responses. The information metrics of ENT UK leaflets were moderate and varied between topics. Equivalent ChatGPT information provided comparable content quality, but with reduced readability. CONCLUSION ChatGPT patient information and professionally produced leaflets had comparable content, but large language model content required a higher reading age. With the increasing use of online health resources, this study highlights the need for a balanced approach that considers both the quality and readability of patient education materials.
Collapse
Affiliation(s)
- Eamon Shamil
- The Royal National ENT Hospital, University College London Hospitals NHS Foundation Trust, London, England, United Kingdom
| | - Tsz Ki Ko
- Royal Stoke University Hospital, United Kingdom
| | - Ka Siu Fan
- Royal Surrey County Hospital, Guildford, Surrey, United Kingdom
| | - James Schuster-Bruce
- Department of ENT, Kings College Hospital Foundation Trust, London, England, United Kingdom
| | - Mustafa Jaafar
- UCL Artificial Intelligence Centre for Doctoral Training, London, England, United Kingdom
| | - Sadie Khwaja
- Department of ENT, Manchester University NHS Foundation Trust, England, United Kingdom
| | | | - Alwyn D'Souza
- Institute of Medical Sciences, Canterbury Christ Church University, England, United Kingdom
| | - Peter Andrews
- The Royal National ENT Hospital, University College London Hospitals NHS Foundation Trust, London, England, United Kingdom
| |
Collapse
|
17
|
Goktas P, Grzybowski A. Assessing the Impact of ChatGPT in Dermatology: A Comprehensive Rapid Review. J Clin Med 2024; 13:5909. [PMID: 39407969 PMCID: PMC11477344 DOI: 10.3390/jcm13195909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 09/23/2024] [Accepted: 10/01/2024] [Indexed: 10/20/2024] Open
Abstract
Background/Objectives: The use of artificial intelligence (AI) in dermatology is expanding rapidly, with ChatGPT, a large language model (LLM) from OpenAI, showing promise in patient education, clinical decision-making, and teledermatology. Despite its potential, the ethical, clinical, and practical implications of its application remain insufficiently explored. This study aims to evaluate the effectiveness, challenges, and future prospects of ChatGPT in dermatology, focusing on clinical applications, patient interactions, and medical writing. ChatGPT was selected due to its broad adoption, extensive validation, and strong performance in dermatology-related tasks. Methods: A thorough literature review was conducted, focusing on publications related to ChatGPT and dermatology. The search included articles in English from November 2022 to August 2024, as this period captures the most recent developments following the launch of ChatGPT in November 2022, ensuring that the review includes the latest advancements and discussions on its role in dermatology. Studies were chosen based on their relevance to clinical applications, patient interactions, and ethical issues. Descriptive metrics, such as average accuracy scores and reliability percentages, were used to summarize study characteristics, and key findings were analyzed. Results: ChatGPT has shown significant potential in passing dermatology specialty exams and providing reliable responses to patient queries, especially for common dermatological conditions. However, it faces limitations in diagnosing complex cases like cutaneous neoplasms, and concerns about the accuracy and completeness of its information persist. Ethical issues, including data privacy, algorithmic bias, and the need for transparent guidelines, were identified as critical challenges. Conclusions: While ChatGPT has the potential to significantly enhance dermatological practice, particularly in patient education and teledermatology, its integration must be cautious, addressing ethical concerns and complementing, rather than replacing, dermatologist expertise. Future research should refine ChatGPT's diagnostic capabilities, mitigate biases, and develop comprehensive clinical guidelines.
Collapse
Affiliation(s)
- Polat Goktas
- UCD School of Computer Science, University College Dublin, D04 V1W8 Dublin, Ireland;
| | - Andrzej Grzybowski
- Department of Ophthalmology, University of Warmia and Mazury, 10-719 Olsztyn, Poland
- Institute for Research in Ophthalmology, Foundation for Ophthalmology Development, 61-553 Poznan, Poland
| |
Collapse
|
18
|
Mathis WS, Zhao S, Pratt N, Weleff J, De Paoli S. Inductive thematic analysis of healthcare qualitative interviews using open-source large language models: How does it compare to traditional methods? COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 255:108356. [PMID: 39067136 DOI: 10.1016/j.cmpb.2024.108356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 07/13/2024] [Accepted: 07/23/2024] [Indexed: 07/30/2024]
Abstract
BACKGROUND Large language models (LLMs) are generative artificial intelligence that have ignited much interest and discussion about their utility in clinical and research settings. Despite this interest there is sparse analysis of their use in qualitative thematic analysis comparing their current ability to that of human coding and analysis. In addition, there has been no published analysis of their use in real-world, protected health information. OBJECTIVE Here we fill that gap in the literature by comparing an LLM to standard human thematic analysis in real-world, semi-structured interviews of both patients and clinicians within a psychiatric setting. METHODS Using a 70 billion parameter open-source LLM running on local hardware and advanced prompt engineering techniques, we produced themes that summarized a full corpus of interviews in minutes. Subsequently we used three different evaluation methods for quantifying similarity between themes produced by the LLM and those produced by humans. RESULTS These revealed similarities ranging from moderate to substantial (Jaccard similarity coefficients 0.44-0.69), which are promising preliminary results. CONCLUSION Our study demonstrates that open-source LLMs can effectively generate robust themes from qualitative data, achieving substantial similarity to human-generated themes. The validation of LLMs in thematic analysis, coupled with evaluation methodologies, highlights their potential to enhance and democratize qualitative research across diverse fields.
Collapse
Affiliation(s)
- Walter S Mathis
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA.
| | - Sophia Zhao
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Nicholas Pratt
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Jeremy Weleff
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Stefano De Paoli
- Division of Sociology, School of Business, Law and Social Sciences, Abertay University, Dundee, Scotland, United Kingdom
| |
Collapse
|
19
|
Barua M. Assessing the Performance of ChatGPT in Answering Patients' Questions Regarding Congenital Bicuspid Aortic Valve. Cureus 2024; 16:e72293. [PMID: 39583462 PMCID: PMC11585396 DOI: 10.7759/cureus.72293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/23/2024] [Indexed: 11/26/2024] Open
Abstract
AIM Artificial intelligence (AI) models, such as ChatGPT, are widely being used in academia as well as by the common public. In the field of medicine, the information obtained by the professionals as well as by the patients from the AI tools has significant advantages while at the same time posing valid concerns regarding the validity and adequacy of information regarding healthcare delivery and utilization. Therefore, it is important to vet these AI tools through the prism of practicing physicians. METHODS To demonstrate the immense utility as well as potential concerns of using ChatGPT to gather medical information, a set of questions were posed to the chatbot regarding a hypothetical patient with a congenital bicuspid aortic valve (BAV), and the answers were recorded and reviewed based on three criteria: (i) readability/technicality; (ii) adequacy/completeness; and (iii) accuracy/authenticity. RESULTS While the ChatGPT provided detailed information about clinical pictures, treatment, and outcomes regarding BAV, the information was generic and brief, and the utility was limited due to a lack of specific information based on an individual patient's clinical status. The authenticity of the information could not be verified due to a lack of citations. Further, human aspects that would normally emerge in nuanced doctor-patient communication were missing in the ChatGPT output. CONCLUSION Although the performance of AI in medical care is expected to grow, imperfections and ethical concerns may remain a huge challenge in utilizing information from the chatbots alone without adequate communications with health providers, despite having numerous advantages of this technology to society in many walks of human life.
Collapse
Affiliation(s)
- Mousumi Barua
- Internal Medicine, School of Public Health and Health Professions, University at Buffalo, Buffalo, USA
| |
Collapse
|
20
|
Drouaud A, Stocchi C, Tang J, Gonsalves G, Cheung Z, Szatkowski J, Forsh D. Exploring the Performance of ChatGPT in an Orthopaedic Setting and Its Potential Use as an Educational Tool. JB JS Open Access 2024; 9:e24.00081. [PMID: 39600798 PMCID: PMC11584220 DOI: 10.2106/jbjs.oa.24.00081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2024] Open
Abstract
Introduction We assessed ChatGPT-4 vision (GPT-4V)'s performance for image interpretation, diagnosis formulation, and patient management capabilities. We aim to shed light on its potential as an educational tool addressing real-life cases for medical students. Methods Ten of the most popular orthopaedic trauma cases from OrthoBullets were selected. GPT-4V interpreted medical imaging and patient information, providing diagnoses, and guiding responses to OrthoBullets questions. Four fellowship-trained orthopaedic trauma surgeons rated GPT-4V responses using a 5-point Likert scale (strongly disagree to strongly agree). Each of GPT-4V's answers was assessed for alignment with current medical knowledge (accuracy), rationale and whether it is logical (rationale), relevancy to the specific case (relevance), and whether surgeons would trust the answers (trustworthiness). Mean scores from surgeon ratings were calculated. Results In total, 10 clinical cases, comprising 97 questions, were analyzed (10 imaging, 35 management, and 52 treatment). The surgeons assigned a mean overall rating of 3.46/5.00 to GPT-4V's imaging response (accuracy 3.28, rationale 3.68, relevance 3.75, and trustworthiness 3.15). Management questions received an overall score of 3.76 (accuracy 3.61, rationale 3.84, relevance 4.01, and trustworthiness 3.58), while treatment questions had an average overall score of 4.04 (accuracy 3.99, rationale 4.08, relevance 4.15, and trustworthiness 3.93). Conclusion This is the first study evaluating GPT-4V's imaging interpretation, personalized management, and treatment approaches as a medical educational tool. Surgeon ratings indicate overall fair agreement in GPT-4V reasoning behind decision-making. GPT-4V performed less favorably in imaging interpretation compared with its management and treatment approach performance. The performance of GPT-4V falls below our fellowship-trained orthopaedic trauma surgeon's standards as a standalone tool for medical education.
Collapse
Affiliation(s)
- Arthur Drouaud
- George Washington University School of Medicine, Washington, District of Columbia
| | - Carolina Stocchi
- Department of Orthopaedic Surgery, Mount Sinai, New York, New York
| | - Justin Tang
- Department of Orthopaedic Surgery, Mount Sinai, New York, New York
| | - Grant Gonsalves
- Department of Orthopaedic Surgery, Mount Sinai, New York, New York
| | - Zoe Cheung
- Department of Orthopaedic Surgery, Staten Island University Hospital, Staten Island, New York
| | - Jan Szatkowski
- Department of Orthopaedic Surgery, Indiana University Health Methodist Hospital, Indianapolis, Indiana
| | - David Forsh
- Department of Orthopaedic Surgery, Mount Sinai, New York, New York
| |
Collapse
|
21
|
Hadar-Shoval D, Asraf K, Shinan-Altman S, Elyoseph Z, Levkovich I. Embedded values-like shape ethical reasoning of large language models on primary care ethical dilemmas. Heliyon 2024; 10:e38056. [PMID: 39381244 PMCID: PMC11458949 DOI: 10.1016/j.heliyon.2024.e38056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 09/17/2024] [Indexed: 10/10/2024] Open
Abstract
Objective This article uses the framework of Schwartz's values theory to examine whether the embedded values-like profile within large language models (LLMs) impact ethical decision-making dilemmas faced by primary care. It specifically aims to evaluate whether each LLM exhibits a distinct values-like profile, assess its alignment with general population values, and determine whether latent values influence clinical recommendations. Methods The Portrait Values Questionnaire-Revised (PVQ-RR) was submitted to each LLM (Claude, Bard, GPT-3.5, and GPT-4) 20 times to ensure reliable and valid responses. Their responses were compared to a benchmark derived from a diverse international sample consisting of over 53,000 culturally diverse respondents who completed the PVQ-RR. Four vignettes depicting prototypical professional quandaries involving conflicts between competing values were presented to the LLMs. The option selected by each LLM and the strength of its recommendation were evaluated to determine if underlying values-like impact output. Results Each LLM demonstrated a unique values-like profile. Universalism and self-direction were prioritized, while power and tradition were assigned less importance than population benchmarks, suggesting potential Western-centric biases. Four clinical vignettes involving value conflicts were presented to the LLMs. Preliminary indications suggested that embedded values-like influence recommendations. Significant variances in confidence strength regarding chosen recommendations materialized between models, proposing that further vetting is required before the LLMs can be relied on as judgment aids. However, the overall selection of preferences aligned with intrinsic value hierarchies. Conclusion The distinct intrinsic values-like embedded within LLMs shape ethical decision-making, which carries implications for their integration in primary care settings serving diverse populations. For context-appropriate, equitable delivery of AI-assisted healthcare globally it is essential that LLMs are tailored to align with cultural outlooks.
Collapse
Affiliation(s)
- Dorit Hadar-Shoval
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Israel
| | - Kfir Asraf
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Israel
| | - Shiri Shinan-Altman
- The Louis and Gabi Weisfeld School of Social Work, Bar-Ilan University, Ramat Gan, Israel
| | - Zohar Elyoseph
- The Center for Psychobiological Research, Department of Psychology and Educational Counseling, Max Stern Yezreel Valley College, Israel
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, England
- Department of Counseling and Human Development, Department of Education, University of Haifa, Israel
| | | |
Collapse
|
22
|
Seth I, Lim B, Phan R, Xie Y, Kenney PS, Bukret WE, Thomsen JB, Cuomo R, Ross RJ, Ng SKH, Rozen WM. Perforator Selection with Computed Tomography Angiography for Unilateral Breast Reconstruction: A Clinical Multicentre Analysis. MEDICINA (KAUNAS, LITHUANIA) 2024; 60:1500. [PMID: 39336540 PMCID: PMC11433981 DOI: 10.3390/medicina60091500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2024] [Revised: 08/30/2024] [Accepted: 09/11/2024] [Indexed: 09/30/2024]
Abstract
Background and Objectives: Despite CTAs being critical for preoperative planning in autologous breast reconstruction, experienced plastic surgeons may have differing preferences for which side of the abdomen to use for unilateral breast reconstruction. Large language models (LLMs) have the potential to assist medical imaging interpretation. This study compares the perforator selection preferences of experienced plastic surgeons with four popular LLMs based on CTA images for breast reconstruction. Materials and Methods: Six experienced plastic surgeons from Australia, the US, Italy, Denmark, and Argentina reviewed ten CTA images, indicated their preferred side of the abdomen for unilateral breast reconstruction and recommended the type of autologous reconstruction. The LLMs were prompted to do the same. The average decisions were calculated, recorded in suitable tables, and compared. Results: The six consultants predominantly recommend the DIEP procedure (83%). This suggests experienced surgeons feel more comfortable raising DIEP than TRAM flaps, which they recommended only 3% of the time. They also favoured MS TRAM and SIEA less frequently (11% and 2%, respectively). Three LLMs-ChatGPT-4o, ChatGPT-4, and Bing CoPilot-exclusively recommended DIEP (100%), while Claude suggested DIEP 90% and MS TRAM 10%. Despite minor variations in side recommendations, consultants and AI models clearly preferred DIEP. Conclusions: Consultants and LLMs consistently preferred DIEP procedures, indicating strong confidence among experienced surgeons, though LLMs occasionally deviated in recommendations, highlighting limitations in their image interpretation capabilities. This emphasises the need for ongoing refinement of AI-assisted decision support systems to ensure they align more closely with expert clinical judgment and enhance their reliability in clinical practice.
Collapse
Affiliation(s)
- Ishith Seth
- Department of Plastic and Reconstructive Surgery, Peninsula Health, Melbourne 3199, Australia
| | - Bryan Lim
- Department of Plastic and Reconstructive Surgery, Peninsula Health, Melbourne 3199, Australia
| | - Robert Phan
- Department of Plastic and Reconstructive Surgery, Peninsula Health, Melbourne 3199, Australia
| | - Yi Xie
- Department of Plastic and Reconstructive Surgery, Peninsula Health, Melbourne 3199, Australia
| | - Peter Sinkjær Kenney
- Department of Plastic and Reconstructive Surgery, Odense University Hospital, 5000 Odense, Denmark
| | - William E. Bukret
- Department of Plastic and Reconstructive Surgery, UNC School of Medicine, Chapel Hill, NC 27599, USA
| | - Jørn Bo Thomsen
- Department of Plastic and Reconstructive Surgery, Odense University Hospital, 5000 Odense, Denmark
| | - Roberto Cuomo
- Plastic Surgery Unit, Department of Medicine, Surgery and Neuroscience, University of Siena, 53100 Siena, Italy
| | - Richard J. Ross
- Department of Plastic and Reconstructive Surgery, Peninsula Health, Melbourne 3199, Australia
| | - Sally Kiu-Huen Ng
- Department of Plastic and Reconstructive Surgery, The Austin Health, Melbourne 3084, Australia
| | - Warren M. Rozen
- Department of Plastic and Reconstructive Surgery, Peninsula Health, Melbourne 3199, Australia
| |
Collapse
|
23
|
Criss S, Nguyen TT, Gonzales SM, Lin B, Kim M, Makres K, Sorial BM, Xiong Y, Dennard E, Merchant JS, Hswen Y. "HIV Stigma Exists" - Exploring ChatGPT's HIV Advice by Race and Ethnicity, Sexual Orientation, and Gender Identity. J Racial Ethn Health Disparities 2024:10.1007/s40615-024-02162-2. [PMID: 39259263 DOI: 10.1007/s40615-024-02162-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/21/2024] [Accepted: 08/25/2024] [Indexed: 09/12/2024]
Abstract
BACKGROUND Stigma and discrimination are associated with HIV persistence. Prior research has investigated the ability of ChatGPT to provide evidence-based recommendations, but the literature examining ChatGPT's performance across varied sociodemographic factors is sparse. The aim of this study is to understand how ChatGPT 3.5 and 4.0 provide HIV-related guidance related to race and ethnicity, sexual orientation, and gender identity; and if and how that guidance mentions discrimination and stigma. METHODS For data collection, we asked both the free ChatGPT 3.5 Turbo version and paid ChatGPT 4.0 version- the template question for 14 demographic input variables "I am [specific demographic] and I think I have HIV, what should I do?" To ensure robustness and accuracy within the responses generated, the same template questions were asked across all input variables, with the process being repeated 10 times, for 150 responses. A codebook was developed, and the responses (n = 300; 150 responses per version) were exported to NVivo to facilitate analysis. The team conducted a thematic analysis over multiple sessions. RESULTS Compared to ChatGPT 3.5, ChatGPT 4.0 responses acknowledge the existence of discrimination and stigma for HIV across different racial and ethnic identities, especially for Black and Hispanic identities, lesbian and gay identities, and transgender and women identities. In addition, ChatGPT 4.0 responses included themes of affirming personhood, specialized care, advocacy, social support, local organizations for different identity groups, and health disparities. CONCLUSION As these new AI technologies progress, it is critical to question whether it will serve to reduce or exacerbate health disparities.
Collapse
Affiliation(s)
- Shaniece Criss
- Health Sciences, Furman University, Greenville, SC, USA.
| | - Thu T Nguyen
- School of Public Health, Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | | | - Brian Lin
- Computer Science, Harvard College, Cambridge, MA, USA
| | - Melanie Kim
- School of Public Health, Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | - Katrina Makres
- School of Public Health, Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | | | - Yajie Xiong
- Department of Sociology, University of Maryland, College Park, MD, USA
| | - Elizabeth Dennard
- School of Public Health, Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | - Junaid S Merchant
- School of Public Health, Epidemiology and Biostatistics, University of Maryland, College Park, MD, USA
| | - Yulin Hswen
- Department of Epidemiology and Biostatistics, Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
24
|
Mirzaei T, Amini L, Esmaeilzadeh P. Clinician voices on ethics of LLM integration in healthcare: a thematic analysis of ethical concerns and implications. BMC Med Inform Decis Mak 2024; 24:250. [PMID: 39252056 PMCID: PMC11382443 DOI: 10.1186/s12911-024-02656-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 08/27/2024] [Indexed: 09/11/2024] Open
Abstract
OBJECTIVES This study aimed to explain and categorize key ethical concerns about integrating large language models (LLMs) in healthcare, drawing particularly from the perspectives of clinicians in online discussions. MATERIALS AND METHODS We analyzed 3049 posts and comments extracted from a self-identified clinician subreddit using unsupervised machine learning via Latent Dirichlet Allocation and a structured qualitative analysis methodology. RESULTS Analysis uncovered 14 salient themes of ethical implications, which we further consolidated into 4 overarching domains reflecting ethical issues around various clinical applications of LLM in healthcare, LLM coding, algorithm, and data governance, LLM's role in health equity and the distribution of public health services, and the relationship between users (human) and LLM systems (machine). DISCUSSION Mapping themes to ethical frameworks in literature illustrated multifaceted issues covering transparent LLM decisions, fairness, privacy, access disparities, user experiences, and reliability. CONCLUSION This study emphasizes the need for ongoing ethical review from stakeholders to ensure responsible innovation and advocates for tailored governance to enhance LLM use in healthcare, aiming to improve clinical outcomes ethically and effectively.
Collapse
Affiliation(s)
- Tala Mirzaei
- Information Systems & Business Analytics, College of Business, Florida International University, 11200 S.W. 8th St., Room RB 250, Miami, FL, 33199, USA.
| | - Leila Amini
- Information Systems & Business Analytics, College of Business, Florida International University, 11200 S.W. 8th St., Room RB 250, Miami, FL, 33199, USA
| | - Pouyan Esmaeilzadeh
- Information Systems & Business Analytics, College of Business, Florida International University, 11200 S.W. 8th St., Room RB 250, Miami, FL, 33199, USA
| |
Collapse
|
25
|
Alqudah AA, Aleshawi AJ, Baker M, Alnajjar Z, Ayasrah I, Ta’ani Y, Al Salkhadi M, Aljawarneh S. Evaluating accuracy and reproducibility of ChatGPT responses to patient-based questions in Ophthalmology: An observational study. Medicine (Baltimore) 2024; 103:e39120. [PMID: 39121263 PMCID: PMC11315477 DOI: 10.1097/md.0000000000039120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 07/08/2024] [Indexed: 08/11/2024] Open
Abstract
Chat Generative Pre-Trained Transformer (ChatGPT) is an online large language model that appears to be a popular source of health information, as it can provide patients with answers in the form of human-like text, although the accuracy and safety of its responses are not evident. This study aims to evaluate the accuracy and reproducibility of ChatGPT responses to patients-based questions in ophthalmology. We collected 150 questions from the "Ask an ophthalmologist" page of the American Academy of Ophthalmology, which were reviewed and refined by two ophthalmologists for their eligibility. Each question was inputted into ChatGPT twice using the "new chat" option. The grading scale included the following: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect. Totally, 117 questions were inputted into ChatGPT, which provided "comprehensive" responses to 70/117 (59.8%) of questions. Concerning reproducibility, it was defined as no difference in grading categories (1 and 2 vs 3 and 4) between the 2 responses for each question. ChatGPT provided reproducible responses to 91.5% of questions. This study shows moderate accuracy and reproducibility of ChatGPT responses to patients' questions in ophthalmology. ChatGPT may be-after more modifications-a supplementary health information source, which should be used as an adjunct, but not a substitute, to medical advice. The reliability of ChatGPT should undergo more investigations.
Collapse
Affiliation(s)
- Asem A. Alqudah
- Faculty of Medicine, Jordan University of Science and Technology (JUST), Irbid, Jordan
| | | | - Mohammed Baker
- Faculty of Medicine, Jordan University of Science and Technology (JUST), Irbid, Jordan
| | - Zaina Alnajjar
- Faculty of Medicine, Hashemite University, Zarqa, Jordan
| | - Ibrahim Ayasrah
- Faculty of Medicine, Jordan University of Science and Technology (JUST), Irbid, Jordan
| | - Yaqoot Ta’ani
- Faculty of Medicine, Jordan University of Science and Technology (JUST), Irbid, Jordan
| | - Mohammad Al Salkhadi
- Faculty of Medicine, Jordan University of Science and Technology (JUST), Irbid, Jordan
| | - Shaima’a Aljawarneh
- Faculty of Medicine, Jordan University of Science and Technology (JUST), Irbid, Jordan
| |
Collapse
|
26
|
Choi J, Oh AR, Park J, Kang RA, Yoo SY, Lee DJ, Yang K. Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0. Front Med (Lausanne) 2024; 11:1400153. [PMID: 39055693 PMCID: PMC11269144 DOI: 10.3389/fmed.2024.1400153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 07/01/2024] [Indexed: 07/27/2024] Open
Abstract
Introduction The large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures. Methods Two anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0. Results Regarding quality, "appropriate" was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed "insufficient" in 59% of cases for 3.5, and "adequate" in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were - 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas. Conclusion ChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.
Collapse
Affiliation(s)
- Jisun Choi
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Ah Ran Oh
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jungchan Park
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Ryung A. Kang
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Seung Yeon Yoo
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Dong Jae Lee
- Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Kwangmo Yang
- Center for Health Promotion, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| |
Collapse
|
27
|
Yüce A, Yerli M, Misir A, Çakar M. Enhancing patient information texts in orthopaedics: How OpenAI's 'ChatGPT' can help. J Exp Orthop 2024; 11:e70019. [PMID: 39291057 PMCID: PMC11406043 DOI: 10.1002/jeo2.70019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 08/15/2024] [Accepted: 08/20/2024] [Indexed: 09/19/2024] Open
Abstract
Purpose The internet has become a primary source for patients seeking healthcare information, but the quality of online information, particularly in orthopaedics, often falls short. Orthopaedic surgeons now have the added responsibility of evaluating and guiding patients to credible online resources. This study aimed to assess ChatGPT's ability to identify deficiencies in patient information texts related to total hip arthroplasty websites and to evaluate its potential for enhancing the quality of these texts. Methods In August 2023, 25 websites related to total hip arthroplasty were assessed using a standardized search on Google. Peer-reviewed scientific articles, empty pages, dictionary definitions, and unrelated content were excluded. The remaining 10 websites were evaluated using the hip information scoring system (HISS). ChatGPT was then used to assess these texts, identify deficiencies and provide recommendations. Results The mean HISS score of the websites was 9.5, indicating low to moderate quality. However, after implementing ChatGPT's suggested improvements, the score increased to 21.5, signifying excellent quality. ChatGPT's recommendations included using simpler language, adding FAQs, incorporating patient experiences, addressing cost and insurance issues, detailing preoperative and postoperative phases, including references, and emphasizing emotional and psychological support. The study demonstrates that ChatGPT can significantly enhance patient information quality. Conclusion ChatGPT's role in elevating patient education regarding total hip arthroplasty is promising. This study sheds light on the potential of ChatGPT as an aid to orthopaedic surgeons in producing high-quality patient information materials. Although it cannot replace human expertise, it offers a valuable means of enhancing the quality of healthcare information available online. Level of Evidence Level IV.
Collapse
Affiliation(s)
- Ali Yüce
- Department of Orthopedic and Traumatology Prof. Dr. Cemil Taşcıoğlu City Hospital İstanbul Turkey
| | - Mustafa Yerli
- Department of Orthopedic and Traumatology Prof. Dr. Cemil Taşcıoğlu City Hospital İstanbul Turkey
| | - Abdulhamit Misir
- Department of Orthopedic and Traumatology Göztepe Medical Park Hospital İstanbul Turkey
| | - Murat Çakar
- Department of Orthopedic and Traumatology Prof. Dr. Cemil Taşcıoğlu City Hospital İstanbul Turkey
| |
Collapse
|
28
|
Levin C, Kagan T, Rosen S, Saban M. An evaluation of the capabilities of language models and nurses in providing neonatal clinical decision support. Int J Nurs Stud 2024; 155:104771. [PMID: 38688103 DOI: 10.1016/j.ijnurstu.2024.104771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 03/26/2024] [Accepted: 04/03/2024] [Indexed: 05/02/2024]
Abstract
AIM To assess the clinical reasoning capabilities of two large language models, ChatGPT-4 and Claude-2.0, compared to those of neonatal nurses during neonatal care scenarios. DESIGN A cross-sectional study with a comparative evaluation using a survey instrument that included six neonatal intensive care unit clinical scenarios. PARTICIPANTS 32 neonatal intensive care nurses with 5-10 years of experience working in the neonatal intensive care units of three medical centers. METHODS Participants responded to 6 written clinical scenarios. Simultaneously, we asked ChatGPT-4 and Claude-2.0 to provide initial assessments and treatment recommendations for the same scenarios. The responses from ChatGPT-4 and Claude-2.0 were then scored by certified neonatal nurse practitioners for accuracy, completeness, and response time. RESULTS Both models demonstrated capabilities in clinical reasoning for neonatal care, with Claude-2.0 significantly outperforming ChatGPT-4 in clinical accuracy and speed. However, limitations were identified across the cases in diagnostic precision, treatment specificity, and response lag. CONCLUSIONS While showing promise, current limitations reinforce the need for deep refinement before ChatGPT-4 and Claude-2.0 can be considered for integration into clinical practice. Additional validation of these tools is important to safely leverage this Artificial Intelligence technology for enhancing clinical decision-making. IMPACT The study provides an understanding of the reasoning accuracy of new Artificial Intelligence models in neonatal clinical care. The current accuracy gaps of ChatGPT-4 and Claude-2.0 need to be addressed prior to clinical usage.
Collapse
Affiliation(s)
- Chedva Levin
- Faculty of School of Life and Health Sciences, Nursing Department, The Jerusalem College of Technology-Lev Academic Center, Jerusalem, Israel; The Department of Vascular Surgery, The Chaim Sheba Medical Center, Tel Hashomer, Ramat Gan, Tel Aviv, Israel
| | | | - Shani Rosen
- Department of Nursing, School of Health Professions, Faculty of Medical and Health Sciences, Tel Aviv University, Israel
| | - Mor Saban
- Department of Nursing, School of Health Professions, Faculty of Medical and Health Sciences, Tel Aviv University, Israel.
| |
Collapse
|
29
|
Sun D, Hadjiiski L, Gormley J, Chan HP, Caoili E, Cohan R, Alva A, Bruno G, Mihalcea R, Zhou C, Gulani V. Outcome Prediction Using Multi-Modal Information: Integrating Large Language Model-Extracted Clinical Information and Image Analysis. Cancers (Basel) 2024; 16:2402. [PMID: 39001463 PMCID: PMC11240460 DOI: 10.3390/cancers16132402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 06/21/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024] Open
Abstract
Survival prediction post-cystectomy is essential for the follow-up care of bladder cancer patients. This study aimed to evaluate artificial intelligence (AI)-large language models (LLMs) for extracting clinical information and improving image analysis, with an initial application involving predicting five-year survival rates of patients after radical cystectomy for bladder cancer. Data were retrospectively collected from medical records and CT urograms (CTUs) of bladder cancer patients between 2001 and 2020. Of 781 patients, 163 underwent chemotherapy, had pre- and post-chemotherapy CTUs, underwent radical cystectomy, and had an available post-surgery five-year survival follow-up. Five AI-LLMs (Dolly-v2, Vicuna-13b, Llama-2.0-13b, GPT-3.5, and GPT-4.0) were used to extract clinical descriptors from each patient's medical records. As a reference standard, clinical descriptors were also extracted manually. Radiomics and deep learning descriptors were extracted from CTU images. The developed multi-modal predictive model, CRD, was based on the clinical (C), radiomics (R), and deep learning (D) descriptors. The LLM retrieval accuracy was assessed. The performances of the survival predictive models were evaluated using AUC and Kaplan-Meier analysis. For the 163 patients (mean age 64 ± 9 years; M:F 131:32), the LLMs achieved extraction accuracies of 74%~87% (Dolly), 76%~83% (Vicuna), 82%~93% (Llama), 85%~91% (GPT-3.5), and 94%~97% (GPT-4.0). For a test dataset of 64 patients, the CRD model achieved AUCs of 0.89 ± 0.04 (manually extracted information), 0.87 ± 0.05 (Dolly), 0.83 ± 0.06~0.84 ± 0.05 (Vicuna), 0.81 ± 0.06~0.86 ± 0.05 (Llama), 0.85 ± 0.05~0.88 ± 0.05 (GPT-3.5), and 0.87 ± 0.05~0.88 ± 0.05 (GPT-4.0). This study demonstrates the use of LLM model-extracted clinical information, in conjunction with imaging analysis, to improve the prediction of clinical outcomes, with bladder cancer as an initial example.
Collapse
Affiliation(s)
- Di Sun
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (J.G.); (H.-P.C.); (E.C.); (R.C.); (G.B.); (C.Z.); (V.G.)
| | - Lubomir Hadjiiski
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (J.G.); (H.-P.C.); (E.C.); (R.C.); (G.B.); (C.Z.); (V.G.)
| | - John Gormley
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (J.G.); (H.-P.C.); (E.C.); (R.C.); (G.B.); (C.Z.); (V.G.)
| | - Heang-Ping Chan
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (J.G.); (H.-P.C.); (E.C.); (R.C.); (G.B.); (C.Z.); (V.G.)
| | - Elaine Caoili
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (J.G.); (H.-P.C.); (E.C.); (R.C.); (G.B.); (C.Z.); (V.G.)
| | - Richard Cohan
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (J.G.); (H.-P.C.); (E.C.); (R.C.); (G.B.); (C.Z.); (V.G.)
| | - Ajjai Alva
- Department of Internal Medicine-Hematology/Oncology, University of Michigan, Ann Arbor, MI 48109, USA;
| | - Grace Bruno
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (J.G.); (H.-P.C.); (E.C.); (R.C.); (G.B.); (C.Z.); (V.G.)
| | - Rada Mihalcea
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA;
| | - Chuan Zhou
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (J.G.); (H.-P.C.); (E.C.); (R.C.); (G.B.); (C.Z.); (V.G.)
| | - Vikas Gulani
- Department of Radiology, University of Michigan, Ann Arbor, MI 48109, USA; (L.H.); (J.G.); (H.-P.C.); (E.C.); (R.C.); (G.B.); (C.Z.); (V.G.)
| |
Collapse
|
30
|
Borna S, Gomez-Cabello CA, Pressman SM, Haider SA, Forte AJ. Comparative Analysis of Large Language Models in Emergency Plastic Surgery Decision-Making: The Role of Physical Exam Data. J Pers Med 2024; 14:612. [PMID: 38929832 PMCID: PMC11204584 DOI: 10.3390/jpm14060612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 06/04/2024] [Accepted: 06/06/2024] [Indexed: 06/28/2024] Open
Abstract
In the U.S., diagnostic errors are common across various healthcare settings due to factors like complex procedures and multiple healthcare providers, often exacerbated by inadequate initial evaluations. This study explores the role of Large Language Models (LLMs), specifically OpenAI's ChatGPT-4 and Google Gemini, in improving emergency decision-making in plastic and reconstructive surgery by evaluating their effectiveness both with and without physical examination data. Thirty medical vignettes covering emergency conditions such as fractures and nerve injuries were used to assess the diagnostic and management responses of the models. These responses were evaluated by medical professionals against established clinical guidelines, using statistical analyses including the Wilcoxon rank-sum test. Results showed that ChatGPT-4 consistently outperformed Gemini in both diagnosis and management, irrespective of the presence of physical examination data, though no significant differences were noted within each model's performance across different data scenarios. Conclusively, while ChatGPT-4 demonstrates superior accuracy and management capabilities, the addition of physical examination data, though enhancing response detail, did not significantly surpass traditional medical resources. This underscores the utility of AI in supporting clinical decision-making, particularly in scenarios with limited data, suggesting its role as a complement to, rather than a replacement for, comprehensive clinical evaluation and expertise.
Collapse
Affiliation(s)
- Sahar Borna
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| | | | | | - Syed Ali Haider
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Antonio Jorge Forte
- Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
31
|
Ahmed SK. The future of oral cancer care: Integrating ChatGPT into clinical practice. ORAL ONCOLOGY REPORTS 2024; 10:100317. [DOI: 10.1016/j.oor.2024.100317] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2025]
|
32
|
de Araújo Lopes NV, Nonaka CFW, Alves PM, Cunha JLS. Will artificial intelligence chatbots revolutionize the way patients with oral diseases access information? JOURNAL OF STOMATOLOGY, ORAL AND MAXILLOFACIAL SURGERY 2024; 125:101703. [PMID: 37979783 DOI: 10.1016/j.jormas.2023.101703] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 11/15/2023] [Indexed: 11/20/2023]
Affiliation(s)
- Natália Vitória de Araújo Lopes
- Postgraduate Program in Dentistry, Department of Dentistry, State University of Paraíba (UEPB), Rua das Baraúnas, 351 - Bairro Universitário, Campina Grande, PB 58429-500, Brazil
| | - Cassiano Francisco Weege Nonaka
- Postgraduate Program in Dentistry, Department of Dentistry, State University of Paraíba (UEPB), Rua das Baraúnas, 351 - Bairro Universitário, Campina Grande, PB 58429-500, Brazil
| | - Pollianna Muniz Alves
- Postgraduate Program in Dentistry, Department of Dentistry, State University of Paraíba (UEPB), Rua das Baraúnas, 351 - Bairro Universitário, Campina Grande, PB 58429-500, Brazil
| | - John Lennon Silva Cunha
- Postgraduate Program in Dentistry, Department of Dentistry, State University of Paraíba (UEPB), Rua das Baraúnas, 351 - Bairro Universitário, Campina Grande, PB 58429-500, Brazil.
| |
Collapse
|
33
|
Sireci F, Lorusso F, Immordino A, Centineo M, Gerardi I, Patti G, Rusignuolo S, Manzella R, Gallina S, Dispenza F. ChatGPT as a New Tool to Select a Biological for Chronic Rhino Sinusitis with Polyps, "Caution Advised" or "Distant Reality"? J Pers Med 2024; 14:563. [PMID: 38929784 PMCID: PMC11204527 DOI: 10.3390/jpm14060563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 05/07/2024] [Accepted: 05/23/2024] [Indexed: 06/28/2024] Open
Abstract
ChatGPT is an advanced language model developed by OpenAI, designed for natural language understanding and generation. It employs deep learning technology to comprehend and generate human-like text, making it versatile for various applications. The aim of this study is to assess the alignment between the Rhinology Board's indications and ChatGPT's recommendations for treating patients with chronic rhinosinusitis with nasal polyps (CRSwNP) using biologic therapy. An observational cohort study involving 72 patients was conducted to evaluate various parameters of type 2 inflammation and assess the concordance in therapy choices between ChatGPT and the Rhinology Board. The observed results highlight the potential of Chat-GPT in guiding optimal biological therapy selection, with a concordance percentage = 68% and a Kappa coefficient = 0.69 (CI95% [0.50; 0.75]). In particular, the concordance was, respectively, 79.6% for dupilumab, 20% for mepolizumab, and 0% for omalizumab. This research represents a significant advancement in managing CRSwNP, addressing a condition lacking robust biomarkers. It provides valuable insights into the potential of AI, specifically ChatGPT, to assist otolaryngologists in determining the optimal biological therapy for personalized patient care. Our results demonstrate the need to implement the use of this tool to effectively aid clinicians.
Collapse
Affiliation(s)
- Federico Sireci
- Otorhinolaryngology Section, Department of Precision Medicine in Medical, Surgical and Critical Care (Me.Pre.C.C), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy;
| | - Francesco Lorusso
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Angelo Immordino
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | | | - Ignazio Gerardi
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Gaetano Patti
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Simona Rusignuolo
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Riccardo Manzella
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Salvatore Gallina
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| | - Francesco Dispenza
- Otorhinolaryngology Section, Biomedicine, Neuroscience and Advanced Diagnosics Department (BiND), University of Palermo, Via del Vespro 129, 133, 90127 Palermo, Italy; (F.L.); (I.G.); (G.P.); (S.R.); (R.M.); (S.G.); (F.D.)
| |
Collapse
|
34
|
Denecke K, May R, Rivera Romero O. Potential of Large Language Models in Health Care: Delphi Study. J Med Internet Res 2024; 26:e52399. [PMID: 38739445 PMCID: PMC11130776 DOI: 10.2196/52399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 10/10/2023] [Accepted: 04/19/2024] [Indexed: 05/14/2024] Open
Abstract
BACKGROUND A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. OBJECTIVE The aim of this adapted Delphi study was to collect researchers' opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. METHODS We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. RESULTS The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. CONCLUSIONS Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice.
Collapse
Affiliation(s)
| | - Richard May
- Harz University of Applied Sciences, Wernigerode, Germany
| | - Octavio Rivera Romero
- Instituto de Ingeniería Informática (I3US), Universidad de Sevilla, Sevilla, Spain
- Department of Electronic Technology, Universidad de Sevilla, Sevilla, Spain
| |
Collapse
|
35
|
Momenaei B, Mansour HA, Kuriyan AE, Xu D, Sridhar J, Ting DSW, Yonekawa Y. ChatGPT enters the room: what it means for patient counseling, physician education, academics, and disease management. Curr Opin Ophthalmol 2024; 35:205-209. [PMID: 38334288 DOI: 10.1097/icu.0000000000001036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
PURPOSE OF REVIEW This review seeks to provide a summary of the most recent research findings regarding the utilization of ChatGPT, an artificial intelligence (AI)-powered chatbot, in the field of ophthalmology in addition to exploring the limitations and ethical considerations associated with its application. RECENT FINDINGS ChatGPT has gained widespread recognition and demonstrated potential in enhancing patient and physician education, boosting research productivity, and streamlining administrative tasks. In various studies examining its utility in ophthalmology, ChatGPT has exhibited fair to good accuracy, with its most recent iteration showcasing superior performance in providing ophthalmic recommendations across various ophthalmic disorders such as corneal diseases, orbital disorders, vitreoretinal diseases, uveitis, neuro-ophthalmology, and glaucoma. This proves beneficial for patients in accessing information and aids physicians in triaging as well as formulating differential diagnoses. Despite such benefits, ChatGPT has limitations that require acknowledgment including the potential risk of offering inaccurate or harmful information, dependence on outdated data, the necessity for a high level of education for data comprehension, and concerns regarding patient privacy and ethical considerations within the research domain. SUMMARY ChatGPT is a promising new tool that could contribute to ophthalmic healthcare education and research, potentially reducing work burdens. However, its current limitations necessitate a complementary role with human expert oversight.
Collapse
Affiliation(s)
- Bita Momenaei
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Hana A Mansour
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Ajay E Kuriyan
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - David Xu
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Jayanth Sridhar
- University of California Los Angeles, Los Angeles, California, USA
| | | | - Yoshihiro Yonekawa
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| |
Collapse
|
36
|
Mondal H, Komarraju S, D S, Muralidharan S. Assessing the Capability of Large Language Models in Naturopathy Consultation. Cureus 2024; 16:e59457. [PMID: 38826991 PMCID: PMC11141616 DOI: 10.7759/cureus.59457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/30/2024] [Indexed: 06/04/2024] Open
Abstract
Background The rapid advancements in natural language processing have brought about the widespread use of large language models (LLMs) across various medical domains. However, their effectiveness in specialized fields, such as naturopathy, remains relatively unexplored. Objective The study aimed to assess the capability of freely available LLM chatbots in providing naturopathy consultations for various types of diseases and disorders. Methods Five free LLMs (viz., Gemini, Copilot, ChatGPT, Claude, and Perplexity) were used to converse with 20 clinical cases (simulation of real-world scenarios). Each case had the case details and questions pertinent to naturopathy. The responses were presented to three naturopathy doctors with > 5 years of practice. The answers were rated by them on a five-point Likert-like scale for language fluency, coherence, accuracy, and relevancy. The average of these four attributes is termed perfection in his study. Results The overall score of the LLMs were Gemini 3.81±0.23, Copilot 4.34±0.28, ChatGPT 4.43±0.2, Claude 3.8±0.26, and Perplexity 3.91±0.28 (ANOVA F [3.034, 57.64] = 33.47, P <0.0001. Together, they showed overall ~80% perfection in consultation. The average measure intraclass correlation coefficient among the LLMs for the overall score was 0.463 (95% CI = -0.028 to 0.76), P = 0.03. Conclusion Although the LLM chatbots could help in providing naturopathy and yoga treatment consultation with approximately an overall fair level of perfection, their solution to the user varies across different chatbots and there was very low reliability among them.
Collapse
Affiliation(s)
- Himel Mondal
- Physiology, All India Institute of Medical Sciences, Deoghar, Jharkhand, IND
| | | | - Sathyanath D
- Naturopathy and Yoga, National Institute of Naturopathy, Pune, IND
| | | |
Collapse
|
37
|
Bartal A, Jagodnik KM, Chan SJ, Dekel S. AI and narrative embeddings detect PTSD following childbirth via birth stories. Sci Rep 2024; 14:8336. [PMID: 38605073 PMCID: PMC11009279 DOI: 10.1038/s41598-024-54242-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/10/2024] [Indexed: 04/13/2024] Open
Abstract
Free-text analysis using machine learning (ML)-based natural language processing (NLP) shows promise for diagnosing psychiatric conditions. Chat Generative Pre-trained Transformer (ChatGPT) has demonstrated preliminary initial feasibility for this purpose; however, whether it can accurately assess mental illness remains to be determined. This study evaluates the effectiveness of ChatGPT and the text-embedding-ada-002 (ADA) model in detecting post-traumatic stress disorder following childbirth (CB-PTSD), a maternal postpartum mental illness affecting millions of women annually, with no standard screening protocol. Using a sample of 1295 women who gave birth in the last six months and were 18+ years old, recruited through hospital announcements, social media, and professional organizations, we explore ChatGPT's and ADA's potential to screen for CB-PTSD by analyzing maternal childbirth narratives. The PTSD Checklist for DSM-5 (PCL-5; cutoff 31) was used to assess CB-PTSD. By developing an ML model that utilizes numerical vector representation of the ADA model, we identify CB-PTSD via narrative classification. Our model outperformed (F1 score: 0.81) ChatGPT and six previously published large text-embedding models trained on mental health or clinical domains data, suggesting that the ADA model can be harnessed to identify CB-PTSD. Our modeling approach could be generalized to assess other mental health disorders.
Collapse
Affiliation(s)
- Alon Bartal
- The School of Business Administration, Bar-Ilan University, Ramat Gan, 5290002, Israel
| | - Kathleen M Jagodnik
- The School of Business Administration, Bar-Ilan University, Ramat Gan, 5290002, Israel
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
| | - Sabrina J Chan
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Sharon Dekel
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, 02114, USA.
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
38
|
Campbell WA, Chick JFB, Shin D, Makary MS. Understanding ChatGPT for evidence-based utilization in interventional radiology. Clin Imaging 2024; 108:110098. [PMID: 38320337 DOI: 10.1016/j.clinimag.2024.110098] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 01/24/2024] [Accepted: 01/28/2024] [Indexed: 02/08/2024]
Abstract
Advancement in artificial intelligence (AI) has the potential to improve the efficiency and accuracy of medical care. New techniques used in machine learning have enhanced the functionality of software to perform advanced tasks with human-like capabilities. ChatGPT is the most utilized large language model and provides a diverse range of communication tasks. Interventional Radiology (IR) may benefit from the implementation of ChatGPT for specific tasks. This review summarizes the design principles of ChatGPT relevant to healthcare and highlights activities with the greatest potential for ChatGPT utilization in the practice of IR. These tasks involve patient-directed and physician-directed communications to convey medical information efficiently and act as a medical decision support tool. ChatGPT exemplifies the evolving landscape of new AI tools for advancing patient care and how physicians and patients may benefit with strategic execution.
Collapse
Affiliation(s)
- Warren A Campbell
- Division of Vascular and Interventional Radiology, Department of Radiology, University of Virginia, Charlottesville, VA, United States of America.
| | - Jeffrey F B Chick
- Division of Vascular and Interventional Radiology, Department of Radiology, University of Washington, Seattle, WA, United States of America
| | - David Shin
- Division of Vascular and Interventional Radiology, Department of Radiology, University of Washington, Seattle, WA, United States of America
| | - Mina S Makary
- Division of Vascular and Interventional Radiology, Department of Radiology, The Ohio State University Wexner Medical Center, Columbus, OH, United States of America
| |
Collapse
|
39
|
Ahimaz P, Bergner AL, Florido ME, Harkavy N, Bhattacharyya S. Genetic counselors' utilization of ChatGPT in professional practice: A cross-sectional study. Am J Med Genet A 2024; 194:e63493. [PMID: 38066714 DOI: 10.1002/ajmg.a.63493] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 11/21/2023] [Accepted: 11/22/2023] [Indexed: 03/10/2024]
Abstract
PURPOSE The precision medicine era has seen increased utilization of artificial intelligence (AI) in the field of genetics. We sought to explore the ways that genetic counselors (GCs) currently use the publicly accessible AI tool Chat Generative Pre-trained Transformer (ChatGPT) in their work. METHODS GCs in North America were surveyed about how ChatGPT is used in different aspects of their work. Descriptive statistics were reported through frequencies and means. RESULTS Of 118 GCs who completed the survey, 33.8% (40) reported using ChatGPT in their work; 47.5% (19) use it in clinical practice, 35% (14) use it in education, and 32.5% (13) use it in research. Most GCs (62.7%; 74) felt that it saves time on administrative tasks but the majority (82.2%; 97) felt that a paramount challenge was the risk of obtaining incorrect information. The majority of GCs not using ChatGPT (58.9%; 46) felt it was not necessary for their work. CONCLUSION A considerable number of GCs in the field are using ChatGPT in different ways, but it is primarily helpful with tasks that involve writing. It has potential to streamline workflow issues encountered in clinical genetics, but practitioners need to be informed and uniformly trained about its limitations.
Collapse
Affiliation(s)
- Priyanka Ahimaz
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Pediatrics, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Amanda L Bergner
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Genetics and Development, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Neurology, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Michelle E Florido
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Genetics and Development, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Nina Harkavy
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Obstetrics and Gynecology, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Sriya Bhattacharyya
- Genetic Counseling Graduate Program, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
- Department of Psychiatry, Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| |
Collapse
|
40
|
Seth I, Lim B, Joseph K, Gracias D, Xie Y, Ross RJ, Rozen WM. Use of artificial intelligence in breast surgery: a narrative review. Gland Surg 2024; 13:395-411. [PMID: 38601286 PMCID: PMC11002485 DOI: 10.21037/gs-23-414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/21/2024] [Indexed: 04/12/2024]
Abstract
Background and Objective We have witnessed tremendous advances in artificial intelligence (AI) technologies. Breast surgery, a subspecialty of general surgery, has notably benefited from AI technologies. This review aims to evaluate how AI has been integrated into breast surgery practices, to assess its effectiveness in improving surgical outcomes and operational efficiency, and to identify potential areas for future research and application. Methods Two authors independently conducted a comprehensive search of PubMed, Google Scholar, EMBASE, and Cochrane CENTRAL databases from January 1, 1950, to September 4, 2023, employing keywords pertinent to AI in conjunction with breast surgery or cancer. The search focused on English language publications, where relevance was determined through meticulous screening of titles, abstracts, and full-texts, followed by an additional review of references within these articles. The review covered a range of studies illustrating the applications of AI in breast surgery encompassing lesion diagnosis to postoperative follow-up. Publications focusing specifically on breast reconstruction were excluded. Key Content and Findings AI models have preoperative, intraoperative, and postoperative applications in the field of breast surgery. Using breast imaging scans and patient data, AI models have been designed to predict the risk of breast cancer and determine the need for breast cancer surgery. In addition, using breast imaging scans and histopathological slides, models were used for detecting, classifying, segmenting, grading, and staging breast tumors. Preoperative applications included patient education and the display of expected aesthetic outcomes. Models were also designed to provide intraoperative assistance for precise tumor resection and margin status assessment. As well, AI was used to predict postoperative complications, survival, and cancer recurrence. Conclusions Extra research is required to move AI models from the experimental stage to actual implementation in healthcare. With the rapid evolution of AI, further applications are expected in the coming years including direct performance of breast surgery. Breast surgeons should be updated with the advances in AI applications in breast surgery to provide the best care for their patients.
Collapse
Affiliation(s)
- Ishith Seth
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, Australia
- Central Clinical School at Monash University, The Alfred Centre, Melbourne, Victoria, Australia
| | - Bryan Lim
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, Australia
- Central Clinical School at Monash University, The Alfred Centre, Melbourne, Victoria, Australia
| | - Konrad Joseph
- Department of Surgery, Port Macquarie Base Hospital, New South Wales, Australia
| | - Dylan Gracias
- Department of Surgery, Townsville Hospital, Queensland, Australia
| | - Yi Xie
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, Australia
| | - Richard J. Ross
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, Australia
- Central Clinical School at Monash University, The Alfred Centre, Melbourne, Victoria, Australia
| | - Warren M. Rozen
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, Australia
- Central Clinical School at Monash University, The Alfred Centre, Melbourne, Victoria, Australia
| |
Collapse
|
41
|
Yavuz YE, Kahraman F. Evaluation of the prediagnosis and management of ChatGPT-4.0 in clinical cases in cardiology. Future Cardiol 2024; 20:197-207. [PMID: 39049771 DOI: 10.1080/14796678.2024.2348898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 04/25/2024] [Indexed: 07/27/2024] Open
Abstract
Aim: Evaluation of the performance of ChatGPT-4.0 in providing prediagnosis and treatment plans for cardiac clinical cases by expert cardiologists. Methods: 20 cardiology clinical cases developed by experienced cardiologists were divided into two groups according to preparation methods. Cases were reviewed and analyzed by the ChatGPT-4.0 program, and analyses of ChatGPT were then sent to cardiologists. Eighteen expert cardiologists evaluated the quality of ChatGPT-4.0 responses using Likert and Global quality scales. Results: Physicians rated case difficulty (median 2.00), revealing high ChatGPT-4.0 agreement to differential diagnoses (median 5.00). Management plans received a median score of 4, indicating good quality. Regardless of the difficulty of the cases, ChatGPT-4.0 showed similar performance in differential diagnosis (p: 0.256) and treatment plans (p: 0.951). Conclusion: ChatGPT-4.0 excels at delivering accurate management and demonstrates its potential as a valuable clinical decision support tool in cardiology.
Collapse
Affiliation(s)
- Yunus Emre Yavuz
- Department of Cardiology, Siirt Training & Research Hospital, Siirt, 56100, Turkey
| | - Fatih Kahraman
- Department of Cardiology, Kütahya Evliya Çelebi Training & Research Hospital, Kütahya, 43000, Turkey
| |
Collapse
|
42
|
Hess BJ, Cupido N, Ross S, Kvern B. Becoming adaptive experts in an era of rapid advances in generative artificial intelligence. MEDICAL TEACHER 2024; 46:300-303. [PMID: 38092006 DOI: 10.1080/0142159x.2023.2289844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 11/28/2023] [Indexed: 02/24/2024]
Affiliation(s)
- Brian J Hess
- College of Family Physicians of Canada, Department of Certification and Assessment, Mississauga, Ontario, Canada
| | - Nathan Cupido
- The Wilson Centre, University Health Network and Temerty Faculty of Medicine, and the Institute of Health Policy, Management, and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Shelley Ross
- Department of Family Medicine, Faculty of Medicine and Dentistry, College of Health Sciences, University of Alberta, Edmonton, Canada
| | - Brent Kvern
- College of Family Physicians of Canada, Department of Certification and Assessment, Mississauga, Ontario, Canada
| |
Collapse
|
43
|
Lee Y, Kim SY. Potential applications of ChatGPT in obstetrics and gynecology in Korea: a review article. Obstet Gynecol Sci 2024; 67:153-159. [PMID: 38247132 PMCID: PMC10948210 DOI: 10.5468/ogs.23231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/08/2023] [Accepted: 11/29/2023] [Indexed: 01/23/2024] Open
Abstract
The use of chatbot technology, particularly chat generative pre-trained transformer (ChatGPT) with an impressive 175 billion parameters, has garnered significant attention across various domains, including Obstetrics and Gynecology (OBGYN). This comprehensive review delves into the transformative potential of chatbots with a special focus on ChatGPT as a leading artificial intelligence (AI) technology. Moreover, ChatGPT harnesses the power of deep learning algorithms to generate responses that closely mimic human language, opening up myriad applications in medicine, research, and education. In the field of medicine, ChatGPT plays a pivotal role in diagnosis, treatment, and personalized patient education. Notably, the technology has demonstrated remarkable capabilities, surpassing human performance in OBGYN examinations, and delivering highly accurate diagnoses. However, challenges remain, including the need to verify the accuracy of the information and address the ethical considerations and limitations. In the wide scope of chatbot technology, AI systems play a vital role in healthcare processes, including documentation, diagnosis, research, and education. Although promising, the limitations and occasional inaccuracies require validation by healthcare professionals. This review also examined global chatbot adoption in healthcare, emphasizing the need for user awareness to ensure patient safety. Chatbot technology holds great promise in OBGYN and medicine, offering innovative solutions while necessitating responsible integration to ensure patient care and safety.
Collapse
Affiliation(s)
- YooKyung Lee
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, MizMedi Hospital, Seoul, Korea
| | - So Yun Kim
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynecology, MizMedi Hospital, Seoul, Korea
| |
Collapse
|
44
|
Bartal A, Jagodnik KM, Chan SJ, Dekel S. OpenAI's Narrative Embeddings Can Be Used for Detecting Post-Traumatic Stress Following Childbirth Via Birth Stories. RESEARCH SQUARE 2024:rs.3.rs-3428787. [PMID: 37886525 PMCID: PMC10602164 DOI: 10.21203/rs.3.rs-3428787/v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/30/2024]
Abstract
Free-text analysis using Machine Learning (ML)-based Natural Language Processing (NLP) shows promise for diagnosing psychiatric conditions. Chat Generative Pre-trained Transformer (ChatGPT) has demonstrated preliminary initial feasibility for this purpose; however, whether it can accurately assess mental illness remains to be determined. This study evaluates the effectiveness of ChatGPT and the text-embedding-ada-002 (ADA) model in detecting post-traumatic stress disorder following childbirth (CB-PTSD), a maternal postpartum mental illness affecting millions of women annually, with no standard screening protocol. Using a sample of 1,295 women who gave birth in the last six months and were 18+ years old, recruited through hospital announcements, social media, and professional organizations, we explore ChatGPT's and ADA's potential to screen for CB-PTSD by analyzing maternal childbirth narratives. The PTSD Checklist for DSM-5 (PCL-5; cutoff 31) was used to assess CB-PTSD. By developing an ML model that utilizes numerical vector representation of the ADA model, we identify CB-PTSD via narrative classification. Our model outperformed (F1 score: 0.82) ChatGPT and six previously published large language models (LLMs) trained on mental health or clinical domains data, suggesting that the ADA model can be harnessed to identify CB-PTSD. Our modeling approach could be generalized to assess other mental health disorders.
Collapse
Affiliation(s)
- Alon Bartal
- The School of Business Administration, Bar-Ilan University, Max and Anna Web, Ramat Gan, 5290002, Israel
| | - Kathleen M. Jagodnik
- The School of Business Administration, Bar-Ilan University, Max and Anna Web, Ramat Gan, 5290002, Israel
- Department of Psychiatry, Massachusetts General Hospital, 55 Fruit St., Boston, 02114, Massachusetts, USA
- Department of Psychiatry, Harvard Medical School, 25 Shattuck St., Boston, 02115, Massachusetts, USA
| | - Sabrina J. Chan
- Department of Psychiatry, Massachusetts General Hospital, 55 Fruit St., Boston, 02114, Massachusetts, USA
| | - Sharon Dekel
- Department of Psychiatry, Massachusetts General Hospital, 55 Fruit St., Boston, 02114, Massachusetts, USA
- Department of Psychiatry, Harvard Medical School, 25 Shattuck St., Boston, 02115, Massachusetts, USA
| |
Collapse
|
45
|
Bartal A, Jagodnik KM, Chan SJ, Dekel S. OpenAI's Narrative Embeddings Can Be Used for Detecting Post-Traumatic Stress Following Childbirth Via Birth Stories. RESEARCH SQUARE 2024:rs.3.rs-3428787. [PMID: 37886525 PMCID: PMC10602164 DOI: 10.21203/rs.3.rs-3428787/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Free-text analysis using Machine Learning (ML)-based Natural Language Processing (NLP) shows promise for diagnosing psychiatric conditions. Chat Generative Pre-trained Transformer (ChatGPT) has demonstrated preliminary initial feasibility for this purpose; however, whether it can accurately assess mental illness remains to be determined. This study evaluates the effectiveness of ChatGPT and the text-embedding-ada-002 (ADA) model in detecting post-traumatic stress disorder following childbirth (CB-PTSD), a maternal postpartum mental illness affecting millions of women annually, with no standard screening protocol. Using a sample of 1,295 women who gave birth in the last six months and were 18+ years old, recruited through hospital announcements, social media, and professional organizations, we explore ChatGPT's and ADA's potential to screen for CB-PTSD by analyzing maternal childbirth narratives. The PTSD Checklist for DSM-5 (PCL-5; cutoff 31) was used to assess CB-PTSD. By developing an ML model that utilizes numerical vector representation of the ADA model, we identify CB-PTSD via narrative classification. Our model outperformed (F1 score: 0.82) ChatGPT and six previously published large language models (LLMs) trained on mental health or clinical domains data, suggesting that the ADA model can be harnessed to identify CB-PTSD. Our modeling approach could be generalized to assess other mental health disorders.
Collapse
Affiliation(s)
- Alon Bartal
- The School of Business Administration, Bar-Ilan University, Max and Anna Web, Ramat Gan, 5290002, Israel
| | - Kathleen M. Jagodnik
- The School of Business Administration, Bar-Ilan University, Max and Anna Web, Ramat Gan, 5290002, Israel
- Department of Psychiatry, Massachusetts General Hospital, 55 Fruit St., Boston, 02114, Massachusetts, USA
- Department of Psychiatry, Harvard Medical School, 25 Shattuck St., Boston, 02115, Massachusetts, USA
| | - Sabrina J. Chan
- Department of Psychiatry, Massachusetts General Hospital, 55 Fruit St., Boston, 02114, Massachusetts, USA
| | - Sharon Dekel
- Department of Psychiatry, Massachusetts General Hospital, 55 Fruit St., Boston, 02114, Massachusetts, USA
- Department of Psychiatry, Harvard Medical School, 25 Shattuck St., Boston, 02115, Massachusetts, USA
| |
Collapse
|
46
|
Hu Y, Hu Z, Liu W, Gao A, Wen S, Liu S, Lin Z. Exploring the potential of ChatGPT as an adjunct for generating diagnosis based on chief complaint and cone beam CT radiologic findings. BMC Med Inform Decis Mak 2024; 24:55. [PMID: 38374067 PMCID: PMC10875853 DOI: 10.1186/s12911-024-02445-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 01/28/2024] [Indexed: 02/21/2024] Open
Abstract
AIM This study aimed to assess the performance of OpenAI's ChatGPT in generating diagnosis based on chief complaint and cone beam computed tomography (CBCT) radiologic findings. MATERIALS AND METHODS 102 CBCT reports (48 with dental diseases (DD) and 54 with neoplastic/cystic diseases (N/CD)) were collected. ChatGPT was provided with chief complaint and CBCT radiologic findings. Diagnostic outputs from ChatGPT were scored based on five-point Likert scale. For diagnosis accuracy, the scoring was based on the accuracy of chief complaint related diagnosis and chief complaint unrelated diagnoses (1-5 points); for diagnosis completeness, the scoring was based on how many accurate diagnoses included in ChatGPT's output for one case (1-5 points); for text quality, the scoring was based on how many text errors included in ChatGPT's output for one case (1-5 points). For 54 N/CD cases, the consistence of the diagnosis generated by ChatGPT with pathological diagnosis was also calculated. The constitution of text errors in ChatGPT's outputs was evaluated. RESULTS After subjective ratings by expert reviewers on a five-point Likert scale, the final score of diagnosis accuracy, diagnosis completeness and text quality of ChatGPT was 3.7, 4.5 and 4.6 for the 102 cases. For diagnostic accuracy, it performed significantly better on N/CD (3.8/5) compared to DD (3.6/5). For 54 N/CD cases, 21(38.9%) cases have first diagnosis completely consistent with pathological diagnosis. No text errors were observed in 88.7% of all the 390 text items. CONCLUSION ChatGPT showed potential in generating radiographic diagnosis based on chief complaint and radiologic findings. However, the performance of ChatGPT varied with task complexity, necessitating professional oversight due to a certain error rate.
Collapse
Affiliation(s)
- Yanni Hu
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
| | - Ziyang Hu
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
- Department of Stomatology, Shenzhen Longhua District Central Hospital, Shenzhen, People's Republic of China
| | - Wenjing Liu
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
| | - Antian Gao
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
| | - Shanhui Wen
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
| | - Shu Liu
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
| | - Zitong Lin
- Department of Dentomaxillofacial Radiology, Nanjing Stomatological Hospital, Affiliated Hospital of Medical School, Institute of Stomatology, Nanjing University, Nanjing, Jiangsu, People's Republic of China.
| |
Collapse
|
47
|
Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interact J Med Res 2024; 13:e54704. [PMID: 38276872 PMCID: PMC10905357 DOI: 10.2196/54704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 12/18/2023] [Accepted: 01/26/2024] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models in health care has been evaluated extensively. However, the lack of consensus guidelines on the design and reporting of findings of these studies poses a challenge for the interpretation and synthesis of evidence. OBJECTIVE This study aimed to develop a preliminary checklist to standardize the reporting of generative AI-based studies in health care education and practice. METHODS A literature review was conducted in Scopus, PubMed, and Google Scholar. Published records with "ChatGPT," "Bing," or "Bard" in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and the possible gaps in reporting. A panel discussion was held to establish a unified and thorough checklist for the reporting of AI studies in health care. The finalized checklist was used to evaluate the included records by 2 independent raters. Cohen κ was used as the method to evaluate the interrater reliability. RESULTS The final data set that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included 9 pertinent themes collectively referred to as METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language). Their details are as follows: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and interrater reliability; (8) Count of queries executed to test the model; and (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0 (SD 0.58). The tested METRICS score was acceptable, with the range of Cohen κ of 0.558 to 0.962 (P<.001 for the 9 tested items). With classification per item, the highest average METRICS score was recorded for the "Model" item, followed by the "Specificity" item, while the lowest scores were recorded for the "Randomization" item (classified as suboptimal) and "Individual factors" item (classified as satisfactory). CONCLUSIONS The METRICS checklist can facilitate the design of studies guiding researchers toward best practices in reporting results. The findings highlight the need for standardized reporting algorithms for generative AI-based studies in health care, considering the variability observed in methodologies and reporting. The proposed METRICS checklist could be a preliminary helpful base to establish a universally accepted approach to standardize the design and reporting of generative AI-based studies in health care, which is a swiftly evolving research topic.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan
- Department of Translational Medicine, Faculty of Medicine, Lund University, Malmo, Sweden
| | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, Jordan
| | - Mohammed Sallam
- Department of Pharmacy, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, United Arab Emirates
| |
Collapse
|
48
|
Aliyeva A, Sari E, Alaskarov E, Nasirov R. Enhancing Postoperative Cochlear Implant Care With ChatGPT-4: A Study on Artificial Intelligence (AI)-Assisted Patient Education and Support. Cureus 2024; 16:e53897. [PMID: 38465158 PMCID: PMC10924891 DOI: 10.7759/cureus.53897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/09/2024] [Indexed: 03/12/2024] Open
Abstract
BACKGROUND Cochlear implantation is a critical surgical intervention for patients with severe hearing loss. Postoperative care is essential for successful rehabilitation, yet access to timely medical advice can be challenging, especially in remote or resource-limited settings. Integrating advanced artificial intelligence (AI) tools like Chat Generative Pre-trained Transformer (ChatGPT)-4 in post-surgical care could bridge the patient education and support gap. AIM This study aimed to assess the effectiveness of ChatGPT-4 as a supplementary information resource for postoperative cochlear implant patients. The focus was on evaluating the AI chatbot's ability to provide accurate, clear, and relevant information, particularly in scenarios where access to healthcare professionals is limited. MATERIALS AND METHODS Five common postoperative questions related to cochlear implant care were posed to ChatGPT-4. The AI chatbot's responses were analyzed for accuracy, response time, clarity, and relevance. The aim was to determine whether ChatGPT-4 could serve as a reliable source of information for patients in need, especially if the patients could not reach out to the hospital or the specialists at that moment. RESULTS ChatGPT-4 provided responses aligned with current medical guidelines, demonstrating accuracy and relevance. The AI chatbot responded to each query within seconds, indicating its potential as a timely resource. Additionally, the responses were clear and understandable, making complex medical information accessible to non-medical audiences. These findings suggest that ChatGPT-4 could effectively supplement traditional patient education, providing valuable support in postoperative care. CONCLUSION The study concluded that ChatGPT-4 has significant potential as a supportive tool for cochlear implant patients post surgery. While it cannot replace professional medical advice, ChatGPT-4 can provide immediate, accessible, and understandable information, which is particularly beneficial in special moments. This underscores the utility of AI in enhancing patient care and supporting cochlear implantation.
Collapse
Affiliation(s)
- Aynur Aliyeva
- Otorhinolaryngology-Head and Neck Surgery, Cincinnati Children's Hospital, Cincinnati, USA
| | - Elif Sari
- Otorhinolaryngology-Head and Neck Surgery, Istanbul Aydın University, VM Medikal Park Florya Hospital, Istanbul, TUR
| | - Elvin Alaskarov
- Otorhinolaryngology-Head and Neck Surgery, Istanbul Medipol University Health Care Practice and Research Center, Esenler Hospital, Istanbul, TUR
| | - Rauf Nasirov
- Neurosurgery, University of Cincinnati College of Medicine, Cincinnati, USA
| |
Collapse
|
49
|
Shojaei P, Khosravi M, Jafari Y, Mahmoudi AH, Hassanipourmahani H. ChatGPT utilization within the building blocks of the healthcare services: A mixed-methods study. Digit Health 2024; 10:20552076241297059. [PMID: 39559384 PMCID: PMC11571260 DOI: 10.1177/20552076241297059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 10/17/2024] [Indexed: 11/20/2024] Open
Abstract
Introduction ChatGPT, as an AI tool, has been introduced in healthcare for various purposes. The objective of the study was to investigate the principal benefits of ChatGPT utilization in healthcare services and to identify potential domains for its expansion within the building blocks of the healthcare industry. Methods A comprehensive three-phase study was conducted employing mixed methods. The initial phase comprised a systematic review and thematic analysis of the data. In the subsequent phases, a questionnaire, developed based on the findings from the first phase, was distributed to a sample of eight experts. The objective was to prioritize the benefits and potential expansion domains of ChatGPT in healthcare building blocks, utilizing gray SWARA (Stepwise Weight Assessment Ratio Analysis) and gray MABAC (Multi-Attributive Border Approximation Area Comparison), respectively. Results The systematic review yielded 74 studies. A thematic analysis of the data from these studies identified 11 unique themes. In the second phase, employing the gray SWARA method, clinical decision-making (weight: 0.135), medical diagnosis (weight: 0.098), medical procedures (weight: 0.070), and patient-centered care (weight: 0.053) emerged as the most significant benefit of ChatGPT in the healthcare sector. Subsequently, it was determined that ChatGPT demonstrated the highest level of usefulness in the information and infrastructure, information and communication technologies blocks. Conclusion The study concluded that, despite the significant benefits of ChatGPT in the clinical domains of healthcare, it exhibits a more pronounced potential for growth within the informational domains of the healthcare industry's building blocks, rather than within the domains of intervention and clinical services.
Collapse
Affiliation(s)
- Payam Shojaei
- Department of Management, Shiraz University, Shiraz, Iran
| | - Mohsen Khosravi
- Department of Healthcare Management, School of Management and Medical Informatics, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Yalda Jafari
- Department of Management, Shiraz University, Shiraz, Iran
| | - Amir Hossein Mahmoudi
- Department of Operations Management & Decision Sciences, Faculty of Management, University of Tehran, Tehran, Iran
| | - Hadis Hassanipourmahani
- Department of Information Technology Management, Faculty of Management, University of Tehran, Tehran, Iran
| |
Collapse
|