201
|
Dhane AS, Sarode S. AI-chatbots and oral oncology: A transformative frontier in patient support. Oral Oncol 2024; 150:106701. [PMID: 38295621 DOI: 10.1016/j.oraloncology.2024.106701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 01/20/2024] [Indexed: 02/02/2024]
Affiliation(s)
- Amol S Dhane
- Research & Development Cell, Dr. D.Y. Patil Vidyapeeth, Sant-Tukaram Nagar, Pimpri, Pune 411018, MH, India; Dr. D.Y. Patil Unitech Society, Dr. D.Y. Patil Institute of Pharmaceutical Sciences and Research, Pimpri, Pune, Maharashtra, India.
| | - Sachin Sarode
- Department of Oral Pathology and Microbiology, Dr. D.Y. Patil Dental College and Hospital, Dr. D.Y. Patil Vidyapeeth, Sant-Tukaram Nagar, Pimpri, Pune 411018, MH, India; Dr. D.Y. Patil Unitech Society, Dr. D.Y. Patil Institute of Pharmaceutical Sciences and Research, Pimpri, Pune, Maharashtra, India
| |
Collapse
|
202
|
Nikdel M, Ghadimi H, Tavakoli M, Suh DW. Assessment of the Responses of the Artificial Intelligence-based Chatbot ChatGPT-4 to Frequently Asked Questions About Amblyopia and Childhood Myopia. J Pediatr Ophthalmol Strabismus 2024; 61:86-89. [PMID: 37882183 DOI: 10.3928/01913913-20231005-02] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
PURPOSE To assess the responses of the ChatGPT-4, the forerunner artificial intelligence-based chatbot, to frequently asked questions regarding two common pediatric ophthalmologic disorders, amblyopia and childhood myopia. METHODS Twenty-seven questions about amblyopia and 28 questions about childhood myopia were asked of the ChatGPT twice (totally 110 questions). The responses were evaluated by two pediatric ophthalmologists as acceptable, incomplete, or unacceptable. RESULTS There was remarkable agreement (96.4%) between the two pediatric ophthalmologists on their assessment of the responses. Acceptable responses were provided by the ChatGPT to 93 of 110 (84.6%) questions in total (44 of 54 [81.5%] for amblyopia and 49 of 56 [87.5%] questions for childhood myopia). Seven of 54 (12.9%) responses to questions on amblyopia were graded as incomplete compared to 4 of 56 (7.1%) of questions on childhood myopia. The ChatGPT gave inappropriate responses to three questions about amblyopia (5.6%) and childhood myopia (5.4%). The most noticeable inappropriate responses were related to the definition of reverse amblyopia and the threshold of refractive error for prescription of spectacles to children with myopia. CONCLUSIONS The ChatGPT has the potential to serve as an adjunct informational tool for pediatric ophthalmology patients and their caregivers by demonstrating a relatively good performance in answering 84.6% of the most frequently asked questions about amblyopia and childhood myopia. [J Pediatr Ophthalmol Strabismus. 2024;61(2):86-89.].
Collapse
|
203
|
Pradhan F, Fiedler A, Samson K, Olivera-Martinez M, Manatsathit W, Peeraphatdit T. Artificial intelligence compared with human-derived patient educational materials on cirrhosis. Hepatol Commun 2024; 8:e0367. [PMID: 38358382 PMCID: PMC10871753 DOI: 10.1097/hc9.0000000000000367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/11/2023] [Indexed: 02/16/2024] Open
Abstract
BACKGROUND The study compared the readability, grade level, understandability, actionability, and accuracy of standard patient educational material against artificial intelligence chatbot-derived patient educational material regarding cirrhosis. METHODS An identical standardized phrase was used to generate patient educational materials on cirrhosis from 4 large language model-derived chatbots (ChatGPT, DocsGPT, Google Bard, and Bing Chat), and the outputs were compared against a pre-existing human-derived educational material (Epic). Objective scores for readability and grade level were determined using Flesch-Kincaid and Simple Measure of Gobbledygook scoring systems. 14 patients/caregivers and 8 transplant hepatologists were blinded and independently scored the materials on understandability and actionability and indicated whether they believed the material was human or artificial intelligence-generated. Understandability and actionability were determined using the Patient Education Materials Assessment Tool for Printable Materials. Transplant hepatologists also provided medical accuracy scores. RESULTS Most educational materials scored similarly in readability and grade level but were above the desired sixth-grade reading level. All educational materials were deemed understandable by both groups, while only the human-derived educational material (Epic) was considered actionable by both groups. No significant difference in perceived actionability or understandability among the educational materials was identified. Both groups poorly identified which materials were human-derived versus artificial intelligence-derived. CONCLUSIONS Chatbot-derived patient educational materials have comparable readability, grade level, understandability, and accuracy to human-derived materials. Readability, grade level, and actionability may be appropriate targets for improvement across educational materials on cirrhosis. Chatbot-derived patient educational materials show promise, and further studies should assess their usefulness in clinical practice.
Collapse
Affiliation(s)
- Faruq Pradhan
- Department of Gastroenterology and Hepatology, University of Nebraska Medicine, Omaha, Nebraska
| | - Alexandra Fiedler
- Department of Internal Medicine, University of Nebraska Medicine, Omaha, Nebraska
| | - Kaeli Samson
- Department of Biostatistics, College of Public Health, University of Nebraska Medical Center, Omaha, Nebraska
| | - Marco Olivera-Martinez
- Department of Gastroenterology and Hepatology, University of Nebraska Medicine, Omaha, Nebraska
| | - Wuttiporn Manatsathit
- Department of Gastroenterology and Hepatology, University of Nebraska Medicine, Omaha, Nebraska
| | - Thoetchai Peeraphatdit
- Department of Gastroenterology and Hepatology, University of Nebraska Medicine, Omaha, Nebraska
| |
Collapse
|
204
|
Posner KM, Bakus C, Basralian G, Chester G, Zeiman M, O'Malley GR, Klein GR. Evaluating ChatGPT's Capabilities on Orthopedic Training Examinations: An Analysis of New Image Processing Features. Cureus 2024; 16:e55945. [PMID: 38601421 PMCID: PMC11005479 DOI: 10.7759/cureus.55945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/11/2024] [Indexed: 04/12/2024] Open
Abstract
Introduction The efficacy of integrating artificial intelligence (AI) models like ChatGPT into the medical field, specifically orthopedic surgery, has yet to be fully determined. The most recent adaptation of ChatGPT that has yet to be explored is its image analysis capabilities. This study assesses ChatGPT's performance in answering Orthopedic In-Training Examination (OITE) questions, including those that require image analysis. Methods Questions from the 2014, 2015, 2021, and 2022 AAOS OITE were screened for inclusion. All questions without images were entered into ChatGPT 3.5 and 4.0 twice. Questions that necessitated the use of images were only entered into ChatGPT 4.0 twice, as this is the only version of the system that can analyze images. The responses were recorded and compared to AAOS's correct answers, evaluating the AI's accuracy and precision. Results A total of 940 questions were included in the final analysis (457 questions with images and 483 questions without images). ChatGPT 4.0 performed significantly better on questions that did not require image analysis (67.81% vs 47.59%, p<0.001). Discussion While the use of AI in orthopedics is an intriguing possibility, this evaluation demonstrates how, even with the addition of image processing capabilities, ChatGPT still falls short in terms of its accuracy. As AI technology evolves, ongoing research is vital to harness AI's potential effectively, ensuring it complements rather than attempts to replace the nuanced skills of orthopedic surgeons.
Collapse
Affiliation(s)
- Kevin M Posner
- Department of Orthopedic Surgery, Hackensack Meridian School of Medicine, Nutley, USA
| | - Cassandra Bakus
- Department of Orthopedic Surgery, Hackensack Meridian School of Medicine, Nutley, USA
| | - Grace Basralian
- Department of Orthopedic Surgery, Hackensack Meridian School of Medicine, Nutley, USA
| | - Grace Chester
- Department of Orthopedic Surgery, Hackensack Meridian School of Medicine, Nutley, USA
| | - Mallery Zeiman
- Department of Orthopedic Surgery, Hackensack Meridian School of Medicine, Nutley, USA
| | - Geoffrey R O'Malley
- Department of Orthopedic Surgery, Hackensack University Medical Center, Hackensack, USA
| | - Gregg R Klein
- Department of Orthopedic Surgery, Hackensack University Medical Center, Hackensack, USA
| |
Collapse
|
205
|
Shiraishi M, Lee H, Kanayama K, Moriwaki Y, Okazaki M. Appropriateness of Artificial Intelligence Chatbots in Diabetic Foot Ulcer Management. INT J LOW EXTR WOUND 2024:15347346241236811. [PMID: 38419470 DOI: 10.1177/15347346241236811] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Type 2 diabetes is a significant global health concern. It often causes diabetic foot ulcers (DFUs), which affect millions of people and increase amputation and mortality rates. Despite existing guidelines, the complexity of DFU treatment makes clinical decisions challenging. Large language models such as chat generative pretrained transformer (ChatGPT), which are adept at natural language processing, have emerged as valuable resources in the medical field. However, concerns about the accuracy and reliability of the information they provide remain. We aimed to assess the accuracy of various artificial intelligence (AI) chatbots, including ChatGPT, in providing information on DFUs based on established guidelines. Seven AI chatbots were asked clinical questions (CQs) based on the DFU guidelines. Their responses were analyzed for accuracy in terms of answers to CQs, grade of recommendation, level of evidence, and agreement with the reference, including verification of the authenticity of the references provided by the chatbots. The AI chatbots showed a mean accuracy of 91.2% in answers to CQs, with discrepancies noted in grade of recommendation and level of evidence. Claude-2 outperformed other chatbots in the number of verified references (99.6%), whereas ChatGPT had the lowest rate of reference authenticity (66.3%). This study highlights the potential of AI chatbots as tools for disseminating medical information and demonstrates their high degree of accuracy in answering CQs related to DFUs. However, the variability in the accuracy of these chatbots and problems like AI hallucinations necessitate cautious use and further optimization for medical applications. This study underscores the evolving role of AI in healthcare and the importance of refining these technologies for effective use in clinical decision-making and patient education.
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Haesu Lee
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Koji Kanayama
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Yuta Moriwaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| |
Collapse
|
206
|
Chandra A, Chakraborty A. Exploring the role of large language models in radiation emergency response. JOURNAL OF RADIOLOGICAL PROTECTION : OFFICIAL JOURNAL OF THE SOCIETY FOR RADIOLOGICAL PROTECTION 2024; 44:011510. [PMID: 38324900 DOI: 10.1088/1361-6498/ad270c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 02/07/2024] [Indexed: 02/09/2024]
Abstract
In recent times, the field of artificial intelligence (AI) has been transformed by the introduction of large language models (LLMs). These models, popularized by OpenAI's GPT-3, have demonstrated the emergent capabilities of AI in comprehending and producing text resembling human language, which has helped them transform several industries. But its role has yet to be explored in the nuclear industry, specifically in managing radiation emergencies. The present work explores LLMs' contextual awareness, natural language interaction, and their capacity to comprehend diverse queries in a radiation emergency response setting. In this study we identify different user types and their specific LLM use-cases in radiation emergencies. Their possible interactions with ChatGPT, a popular LLM, has also been simulated and preliminary results are presented. Drawing on the insights gained from this exercise and to address concerns of reliability and misinformation, this study advocates for expert guided and domain-specific LLMs trained on radiation safety protocols and historical data. This study aims to guide radiation emergency management practitioners and decision-makers in effectively incorporating LLMs into their decision support framework.
Collapse
Affiliation(s)
- Anirudh Chandra
- Radiation Safety Systems Division, Bhabha Atomic Research Centre, Mumbai 400085, India
| | - Abinash Chakraborty
- Health Physics Division, Bhabha Atomic Research Centre, Mumbai 400085, India
| |
Collapse
|
207
|
Ata AM, Aras B, Yılmaz Taşdelen Ö, Çelik C, Çulha C. Evaluation of Informative Content on Cerebral Palsy in the Era of Artificial Intelligence: The Value of ChatGPT. Phys Occup Ther Pediatr 2024; 44:605-614. [PMID: 38361368 DOI: 10.1080/01942638.2024.2316178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 01/30/2024] [Accepted: 02/04/2024] [Indexed: 02/17/2024]
Abstract
AIMS In addition to the popular search engines on the Internet, ChatGPT may provide accurate and reliable health information. The aim of this study was to examine whether ChatGPT's responses to frequently asked questions concerning cerebral palsy (CP) by families were reliable and useful. METHODS Google trends were used to find the most frequently searched keywords for CP. Five independent physiatrists assessed ChatGPT responses to 10 questions. Seven-point Likert-type scales were used to rate information reliability and usefulness based on whether the answer can be validated and is understandable. RESULTS The median ratings for reliability of information for each question varied from 2 (very unsafe) to 5 (relatively very reliable). The median rating was 4 (reliable) for four questions. The median ratings for usefulness of information varied from 2 (very little useful) to 5 (moderately useful). The median rating was 4 (partly useful) for seven questions. CONCLUSION Although ChatGPT appears promising as an additional tool for informing family members of individuals with CP about medical information, it should be emphasized that both consumers and health care providers should be aware of the limitations of artificial intelligence-generated information.
Collapse
Affiliation(s)
- Ayşe Merve Ata
- Department of Physical Medicine and Rehabilitation, Ankara Bilkent City Hospital, Physical Therapy and Rehabilitation Hospital, Ankara, Turkey
| | - Berke Aras
- Department of Physical Medicine and Rehabilitation, Ankara Bilkent City Hospital, Physical Therapy and Rehabilitation Hospital, Ankara, Turkey
| | - Özlem Yılmaz Taşdelen
- Department of Physical Medicine and Rehabilitation, Ankara Bilkent City Hospital, Physical Therapy and Rehabilitation Hospital, Ankara, Turkey
| | - Canan Çelik
- Department of Physical Medicine and Rehabilitation, Ankara Bilkent City Hospital, Physical Therapy and Rehabilitation Hospital, Ankara, Turkey
| | - Canan Çulha
- Department of Physical Medicine and Rehabilitation, Ankara Bilkent City Hospital, Physical Therapy and Rehabilitation Hospital, Ankara, Turkey
| |
Collapse
|
208
|
Wright BM, Bodnar MS, Moore AD, Maseda MC, Kucharik MP, Diaz CC, Schmidt CM, Mir HR. Is ChatGPT a trusted source of information for total hip and knee arthroplasty patients? Bone Jt Open 2024; 5:139-146. [PMID: 38354748 PMCID: PMC10867788 DOI: 10.1302/2633-1462.52.bjo-2023-0113.r1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/16/2024] Open
Abstract
Aims While internet search engines have been the primary information source for patients' questions, artificial intelligence large language models like ChatGPT are trending towards becoming the new primary source. The purpose of this study was to determine if ChatGPT can answer patient questions about total hip (THA) and knee arthroplasty (TKA) with consistent accuracy, comprehensiveness, and easy readability. Methods We posed the 20 most Google-searched questions about THA and TKA, plus ten additional postoperative questions, to ChatGPT. Each question was asked twice to evaluate for consistency in quality. Following each response, we responded with, "Please explain so it is easier to understand," to evaluate ChatGPT's ability to reduce response reading grade level, measured as Flesch-Kincaid Grade Level (FKGL). Five resident physicians rated the 120 responses on 1 to 5 accuracy and comprehensiveness scales. Additionally, they answered a "yes" or "no" question regarding acceptability. Mean scores were calculated for each question, and responses were deemed acceptable if ≥ four raters answered "yes." Results The mean accuracy and comprehensiveness scores were 4.26 (95% confidence interval (CI) 4.19 to 4.33) and 3.79 (95% CI 3.69 to 3.89), respectively. Out of all the responses, 59.2% (71/120; 95% CI 50.0% to 67.7%) were acceptable. ChatGPT was consistent when asked the same question twice, giving no significant difference in accuracy (t = 0.821; p = 0.415), comprehensiveness (t = 1.387; p = 0.171), acceptability (χ2 = 1.832; p = 0.176), and FKGL (t = 0.264; p = 0.793). There was a significantly lower FKGL (t = 2.204; p = 0.029) for easier responses (11.14; 95% CI 10.57 to 11.71) than original responses (12.15; 95% CI 11.45 to 12.85). Conclusion ChatGPT answered THA and TKA patient questions with accuracy comparable to previous reports of websites, with adequate comprehensiveness, but with limited acceptability as the sole information source. ChatGPT has potential for answering patient questions about THA and TKA, but needs improvement.
Collapse
Affiliation(s)
- Benjamin M. Wright
- Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Michael S. Bodnar
- Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Andrew D. Moore
- Department of Orthopaedic Surgery, University of South Florida, Tampa, Florida, USA
| | - Meghan C. Maseda
- Department of Orthopaedic Surgery, University of South Florida, Tampa, Florida, USA
| | - Michael P. Kucharik
- Department of Orthopaedic Surgery, University of South Florida, Tampa, Florida, USA
| | - Connor C. Diaz
- Department of Orthopaedic Surgery, University of South Florida, Tampa, Florida, USA
| | - Christian M. Schmidt
- Department of Orthopaedic Surgery, University of South Florida, Tampa, Florida, USA
| | - Hassan R. Mir
- Orthopaedic Trauma Service, Florida Orthopedic Institute, Tampa, Florida, USA
| |
Collapse
|
209
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
210
|
Barlas T, Altinova AE, Akturk M, Toruner FB. Credibility of ChatGPT in the assessment of obesity in type 2 diabetes according to the guidelines. Int J Obes (Lond) 2024; 48:271-275. [PMID: 37951982 DOI: 10.1038/s41366-023-01410-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 10/22/2023] [Accepted: 10/30/2023] [Indexed: 11/14/2023]
Abstract
BACKGROUND The Chat Generative Pre-trained Transformer (ChatGPT) allows students, researchers, and patients in the medical field to access information easily and has gained attention nowadays. We aimed to evaluate the credibility of ChatGPT according to the guidelines for the assessment of obesity in type 2 diabetes (T2D), which is one of the major concerns of this century. MATERIALS AND METHOD In this cross-sectional non-human subject study, experienced endocrinologists posed 20 questions to ChatGPT in subsections, which were assessments and different treatment options for obesity according to the American Diabetes Association and American Association of Clinical Endocrinology guidelines. The responses of ChatGPT were classified into four categories: compatible, compatible but insufficient, partially incompatible and incompatible with the guidelines. RESULTS ChatGPT demonstrated a systematic approach to answering questions and recommended consulting a healthcare provider to receive personalized advice based on the specific health needs and circumstances of patients. The compatibility of ChatGPT with the guidelines was 100% in the assessment of obesity in type 2 diabetes; however, it was lower in the therapy sections, which included nutritional, medical, and surgical approaches to weight loss. Furthermore, ChatGPT required additional prompts for responses that were evaluated as "compatible but insufficient" to provide all the information in the guidelines. CONCLUSION The assessment and management of obesity in T2D are highly individualized. Despite ChatGPT's comprehensive and understandable responses, it should not be used as a substitute for healthcare professionals' patient-centered approach.
Collapse
Affiliation(s)
- Tugba Barlas
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey.
| | - Alev Eroglu Altinova
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| | - Mujde Akturk
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| | - Fusun Balos Toruner
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| |
Collapse
|
211
|
Christy M, Morris MT, Goldfarb CA, Dy CJ. Appropriateness and Reliability of an Online Artificial Intelligence Platform's Responses to Common Questions Regarding Distal Radius Fractures. J Hand Surg Am 2024; 49:91-98. [PMID: 38069953 DOI: 10.1016/j.jhsa.2023.10.019] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 02/05/2024]
Abstract
PURPOSE Chat Generative Pre-Trained Transformer (ChatGPT) is a novel artificial intelligence chatbot that is changing the way humans gather information online. The purpose of this study was to investigate ChatGPT's ability to appropriately and reliably answer common questions regarding distal radius fractures. METHODS Thirty common questions regarding distal radius fractures were presented in an identical manner to the online ChatGPT-3.5 interface three separate times, yielding 90 unique responses because ChatGPT produces an original answer with each query. All responses were graded as "appropriate," "appropriate but incomplete," or "inappropriate" by a consensus discussion among three hand surgeon reviewers. The questions were additionally subcategorized into one of four domains based on Bloom's cognitive learning taxonomy, and descriptive statistics were reported. RESULTS Seventy of the 90 total responses (78%) produced by ChatGPT were "appropriate," and 29 of the 30 questions (97%) had at least one response considered appropriate (of the three possible). However, only 17 of the 30 questions (57%) were answered appropriately on all three iterations. The test-retest reliability of ChatGPT was poor with an intraclass correlation coefficient of 0.12. Finally, ChatGPT performed best answering questions requiring lower-order thinking skills (Bloom's levels 1-3) and less well on level 4 questions. CONCLUSIONS This study found that although ChatGPT has the capability to answer common questions regarding distal radius fractures, caution should be taken before implementing its use, given ChatGPT's inconsistency in providing a complete and accurate response to the same question every time. CLINICAL RELEVANCE As the popularity and technology of ChatGPT continue to grow, it is important to understand the potential and limitations of this platform to determine how it may be best implemented to improve patient care.
Collapse
Affiliation(s)
- Michele Christy
- Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO
| | - Marie T Morris
- Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO
| | - Charles A Goldfarb
- Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO
| | - Christopher J Dy
- Department of Orthopaedic Surgery, Washington University in St. Louis, St. Louis, MO.
| |
Collapse
|
212
|
Chen J, Cadiente A, Kasselman LJ, Pilkington B. Assessing the performance of ChatGPT in bioethics: a large language model's moral compass in medicine. JOURNAL OF MEDICAL ETHICS 2024; 50:97-101. [PMID: 37973369 DOI: 10.1136/jme-2023-109366] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/02/2023] [Indexed: 11/19/2023]
Abstract
Chat Generative Pre-Trained Transformer (ChatGPT) has been a growing point of interest in medical education yet has not been assessed in the field of bioethics. This study evaluated the accuracy of ChatGPT-3.5 (April 2023 version) in answering text-based, multiple choice bioethics questions at the level of US third-year and fourth-year medical students. A total of 114 bioethical questions were identified from the widely utilised question banks UWorld and AMBOSS. Accuracy, bioethical categories, difficulty levels, specialty data, error analysis and character count were analysed. We found that ChatGPT had an accuracy of 59.6%, with greater accuracy in topics surrounding death and patient-physician relationships and performed poorly on questions pertaining to informed consent. Of all the specialties, it performed best in paediatrics. Yet, certain specialties and bioethical categories were under-represented. Among the errors made, it tended towards content errors and application errors. There were no significant associations between character count and accuracy. Nevertheless, this investigation contributes to the ongoing dialogue on artificial intelligence's (AI) role in healthcare and medical education, advocating for further research to fully understand AI systems' capabilities and constraints in the nuanced field of medical bioethics.
Collapse
Affiliation(s)
- Jamie Chen
- Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Angelo Cadiente
- Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Lora J Kasselman
- Research Institute, Hackensack Meridian Health, Edison, New Jersey, USA
| | | |
Collapse
|
213
|
Gravina AG, Pellegrino R, Cipullo M, Palladino G, Imperio G, Ventura A, Auletta S, Ciamarra P, Federico A. May ChatGPT be a tool producing medical information for common inflammatory bowel disease patients' questions? An evidence-controlled analysis. World J Gastroenterol 2024; 30:17-33. [PMID: 38293321 PMCID: PMC10823903 DOI: 10.3748/wjg.v30.i1.17] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/07/2023] [Accepted: 12/28/2023] [Indexed: 01/06/2024] Open
Abstract
Artificial intelligence is increasingly entering everyday healthcare. Large language model (LLM) systems such as Chat Generative Pre-trained Transformer (ChatGPT) have become potentially accessible to everyone, including patients with inflammatory bowel diseases (IBD). However, significant ethical issues and pitfalls exist in innovative LLM tools. The hype generated by such systems may lead to unweighted patient trust in these systems. Therefore, it is necessary to understand whether LLMs (trendy ones, such as ChatGPT) can produce plausible medical information (MI) for patients. This review examined ChatGPT's potential to provide MI regarding questions commonly addressed by patients with IBD to their gastroenterologists. From the review of the outputs provided by ChatGPT, this tool showed some attractive potential while having significant limitations in updating and detailing information and providing inaccurate information in some cases. Further studies and refinement of the ChatGPT, possibly aligning the outputs with the leading medical evidence provided by reliable databases, are needed.
Collapse
Affiliation(s)
- Antonietta Gerarda Gravina
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Raffaele Pellegrino
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Marina Cipullo
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Giovanna Palladino
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Giuseppe Imperio
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Andrea Ventura
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Salvatore Auletta
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Paola Ciamarra
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| | - Alessandro Federico
- Division of Hepatogastroenterology, Department of Precision Medicine, University of Campania Luigi Vanvitelli, Naples 80138, Italy
| |
Collapse
|
214
|
Cakir H, Caglar U, Yildiz O, Meric A, Ayranci A, Ozgor F. Evaluating the performance of ChatGPT in answering questions related to urolithiasis. Int Urol Nephrol 2024; 56:17-21. [PMID: 37658948 DOI: 10.1007/s11255-023-03773-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 08/26/2023] [Indexed: 09/05/2023]
Abstract
PURPOSE ChatGPT is an artificial intelligence (AI) program with natural language processing. We analyzed ChatGPT's knowledge about urolithiasis whether it can be used to inform patients about urolithiasis. METHODS Frequently asked questions (FAQs) about urolithiasis on the websites of urological associations and hospitals were analyzed. Also, strong recommendation-level information was gathered from the urolithiasis section of the European Association of Urology (EAU) 2022 Guidelines. All questions were asked in order in ChatGPT August 3rd version. All answers were evaluated separately by two specialist urologists and scored between 1 and 4, where 1: completely correct, 2: correct but inadequate, 3: a mix of correct and misleading information, and 4: completely incorrect. RESULTS Of the FAQs, 94.6% were answered completely correctly. No question was answered completely incorrectly. All questions about general, diagnosis, and ureteral stones were graded as 1. Of the 60 questions prepared according to the EAU guideline recommendations, 50 (83.3%) were evaluated as grade 1, and 8 (13.3%) and 2 (3.3%) as grade 3. All questions related to general, diagnostic, renal calculi, ureteral calculi, and metabolic evaluation received the same answer the second time they were asked. CONCLUSION Our findings demonstrated that ChatGPT accurately and satisfactorily answered more than 95% of the questions about urolithiasis. We conclude that applying ChatGPT in urology clinics under the supervision of urologists can help patients and their families to have better understanding on urolithiasis diagnosis and treatment.
Collapse
Affiliation(s)
- Hakan Cakir
- Department of Urology, Fulya Acibadem Hospital, Sisli, Istanbul, Turkey.
| | - Ufuk Caglar
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Oguzhan Yildiz
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Arda Meric
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Ali Ayranci
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Faruk Ozgor
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| |
Collapse
|
215
|
Daungsupawong H, Wiwanitkit V. Letter 1 regarding "Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma". Clin Mol Hepatol 2024; 30:111-112. [PMID: 37828840 PMCID: PMC10776299 DOI: 10.3350/cmh.2023.0394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 10/09/2023] [Accepted: 10/11/2023] [Indexed: 10/14/2023] Open
Affiliation(s)
| | - Viroj Wiwanitkit
- Research Center, Chandigarh University, Mohali, India
- Department of Biological Science, Joseph Ayobabalola University, Ikeji-Arakeji, Nigeria
| |
Collapse
|
216
|
Piao Y, Chen H, Wu S, Li X, Li Z, Yang D. Assessing the performance of large language models (LLMs) in answering medical questions regarding breast cancer in the Chinese context. Digit Health 2024; 10:20552076241284771. [PMID: 39386109 PMCID: PMC11462564 DOI: 10.1177/20552076241284771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 09/03/2024] [Indexed: 10/12/2024] Open
Abstract
Purpose Large language models (LLMs) are deep learning models designed to comprehend and generate meaningful responses, which have gained public attention in recent years. The purpose of this study is to evaluate and compare the performance of LLMs in answering questions regarding breast cancer in the Chinese context. Material and Methods ChatGPT, ERNIE Bot, and ChatGLM were chosen to answer 60 questions related to breast cancer posed by two oncologists. Responses were scored as comprehensive, correct but inadequate, mixed with correct and incorrect data, completely incorrect, or unanswered. The accuracy, length, and readability among answers from different models were evaluated using statistical software. Results ChatGPT answered 60 questions, with 40 (66.7%) comprehensive answers and six (10.0%) correct but inadequate answers. ERNIE Bot answered 60 questions, with 34 (56.7%) comprehensive answers and seven (11.7%) correct but inadequate answers. ChatGLM generated 60 answers, with 35 (58.3%) comprehensive answers and six (10.0%) correct but inadequate answers. The differences for chosen accuracy metrics among the three LLMs did not reach statistical significance, but only ChatGPT demonstrated a sense of human compassion. The accuracy of the three models in answering questions regarding breast cancer treatment was the lowest, with an average of 44.4%. ERNIE Bot's responses were significantly shorter compared to ChatGPT and ChatGLM (p < .001 for both). The readability scores of the three models showed no statistical significance. Conclusions In the Chinese context, the capabilities of ChatGPT, ERNIE Bot, and ChatGLM are similar in answering breast cancer-related questions at present. These three LLMs may serve as adjunct informational tools for breast cancer patients in the Chinese context, offering guidance for general inquiries. However, for highly specialized issues, particularly in the realm of breast cancer treatment, LLMs cannot deliver reliable performance. It is necessary to utilize them under the supervision of healthcare professionals.
Collapse
Affiliation(s)
- Ying Piao
- Department of Radiation Oncology, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University;
The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen, Guangdong,
People’s Republic of China
| | - Hongtao Chen
- Department of Radiation Oncology, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University;
The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen, Guangdong,
People’s Republic of China
| | - Shihai Wu
- Department of Radiation Oncology, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University;
The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen, Guangdong,
People’s Republic of China
| | - Xianming Li
- Department of Radiation Oncology, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University;
The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen, Guangdong,
People’s Republic of China
| | - Zihuang Li
- Department of Radiation Oncology, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University;
The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen, Guangdong,
People’s Republic of China
| | - Dong Yang
- Department of Radiation Oncology, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University;
The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen, Guangdong,
People’s Republic of China
| |
Collapse
|
217
|
Hwang G, Lee DY, Seol S, Jung J, Choi Y, Her ES, An MH, Park RW. Assessing the potential of ChatGPT for psychodynamic formulations in psychiatry: An exploratory study. Psychiatry Res 2024; 331:115655. [PMID: 38056130 DOI: 10.1016/j.psychres.2023.115655] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/27/2023] [Accepted: 12/01/2023] [Indexed: 12/08/2023]
Abstract
Although there were several attempts to apply ChatGPT (Generative Pre-Trained Transformer) to medicine, little is known about therapeutic applications in psychiatry. In this exploratory study, we aimed to evaluate the characteristics and appropriateness of the psychodynamic formulations created by ChatGPT. Along with a case selected from the psychoanalytic literature, input prompts were designed to include different levels of background knowledge. These included naïve prompts, keywords created by ChatGPT, keywords created by psychiatrists, and psychodynamic concepts from the literature. The psychodynamic formulations generated from the different prompts were evaluated by five psychiatrists from different institutions. We next conducted further tests in which instructions on the use of different psychodynamic models were added to the input prompts. The models used were ego psychology, self-psychology, and object relations. The results from naïve prompts and psychodynamic concepts were rated as appropriate by most raters. The psychodynamic concept prompt output was rated the highest. Interrater agreement was statistically significant. The results from the tests using instructions in different psychoanalytic theories were also rated as appropriate by most raters. They included key elements of the psychodynamic formulation and suggested interpretations similar to the literature. These findings suggest potential of ChatGPT for use in psychiatry.
Collapse
Affiliation(s)
- Gyubeom Hwang
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea; Department of Medical Sciences, Graduate School of Ajou University, Suwon, Republic of Korea
| | - Dong Yun Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea; Department of Medical Sciences, Graduate School of Ajou University, Suwon, Republic of Korea
| | - Soobeen Seol
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea
| | - Jaeoh Jung
- Department of Child and Adolescent Psychiatry, Seoul Metropolitan Eunpyeong Hospital, Seoul, Republic of Korea
| | - Yeonkyu Choi
- Armed Forces Yangju Hospital, Yang-ju, Republic of Korea
| | - Eun Sil Her
- Ajou Big Tree Psychiatric Clinic, Suwon, Republic of Korea
| | - Min Ho An
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea; Department of Medical Sciences, Graduate School of Ajou University, Suwon, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea; Department of Medical Sciences, Graduate School of Ajou University, Suwon, Republic of Korea; Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea.
| |
Collapse
|
218
|
Faisal S, Kamran TE, Khalid R, Haider Z, Siddiqui Y, Saeed N, Imran S, Faisal R, Jabeen M. Evaluating the comprehension and accuracy of ChatGPT's responses to diabetes-related questions in Urdu compared to English. Digit Health 2024; 10:20552076241289730. [PMID: 39430700 PMCID: PMC11490976 DOI: 10.1177/20552076241289730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 09/17/2024] [Indexed: 10/22/2024] Open
Abstract
Introduction Patients with diabetes require healthcare and information that are accurate and extensive. Large language models (LLMs) like ChatGPT herald the capacity to provide such exhaustive data. To determine (a) the comprehensiveness of ChatGPT's responses in Urdu to diabetes-related questions and (b) the accuracy of ChatGPT's Urdu responses when compared to its English responses. Methods A cross-sectional observational study was conducted. Two reviewers experienced in internal medicine and endocrinology graded 53 Urdu and English responses on diabetes knowledge, lifestyle, and prevention. A senior reviewer resolved discrepancies. Responses were assessed for comprehension and accuracy, then compared to English. Results Among the Urdu responses generated, only two of 53 (3.8%) questions were graded as comprehensive, and five of 53 (9.4%) were graded as correct but inadequate. We found that 25 of 53 (47.2%) questions were graded as mixed with correct and incorrect/outdated data, the most significant proportion of responses being graded as such. When considering the comparison of response scale grading the comparative accuracy of Urdu and English responses, no Urdu response (0.0%) was considered to have more accuracy than English. Most of the Urdu responses were found to have an accuracy less than that of English, an overwhelming majority of 49 of 53 (92.5%) responses. Conclusion We found that although the ability to retrieve such information about diabetes is impressive, it can merely be used as an adjunct instead of a solitary source of information. Further work must be done to optimize Urdu responses in medical contexts to approximate the boundless potential it heralds.
Collapse
Affiliation(s)
- Seyreen Faisal
- Shifa College of Medicine, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Tafiya Erum Kamran
- Shifa College of Medicine, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Rimsha Khalid
- Shifa College of Medicine, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Zaira Haider
- Shifa College of Medicine, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Yusra Siddiqui
- Shifa College of Medicine, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Nadia Saeed
- Department of Internal Medicine, Shifa College of Medicine, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Sunaina Imran
- Shifa College of Medicine, Shifa Tameer-e-Millat University, Islamabad, Pakistan
| | - Romaan Faisal
- Islamabad Medical and Dental College, Shaheed Zulfiqar Ali Bhutto Medical University, Islamabad, Pakistan
| | - Misbah Jabeen
- Department of Endocrinology, Shifa International Hospital, Islamabad, Pakistan
| |
Collapse
|
219
|
Dosso JA, Kailley JN, Robillard JM. What Does ChatGPT Know About Dementia? A Comparative Analysis of Information Quality. J Alzheimers Dis 2024; 97:559-565. [PMID: 38143345 PMCID: PMC10836539 DOI: 10.3233/jad-230573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/01/2023] [Indexed: 12/26/2023]
Abstract
The quality of information about dementia retrieved using ChatGPT is unknown. Content was evaluated for length, readability, and quality using the QUEST, a validated tool, and compared against online material from three North American organizations. Both sources of information avoided conflicts of interest, supported the patient-physician relationship, and used a balanced tone. Official bodies but not ChatGPT referenced identifiable research and pointed to local resources. Users of ChatGPT are likely to encounter accurate but shallow information about dementia. Recommendations are made for information creators and providers who counsel patients around digital health practices.
Collapse
Affiliation(s)
- Jill A Dosso
- Department of Medicine, Division of Neurology, The University of British Columbia, Vancouver, British Columbia, Canada
- BC Children's and Women's Hospitals, Vancouver, British Columbia, Canada
| | - Jaya N Kailley
- Department of Medicine, Division of Neurology, The University of British Columbia, Vancouver, British Columbia, Canada
- BC Children's and Women's Hospitals, Vancouver, British Columbia, Canada
| | - Julie M Robillard
- Department of Medicine, Division of Neurology, The University of British Columbia, Vancouver, British Columbia, Canada
- BC Children's and Women's Hospitals, Vancouver, British Columbia, Canada
| |
Collapse
|
220
|
Suárez A, Díaz-Flores García V, Algar J, Gómez Sánchez M, Llorente de Pedro M, Freire Y. Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers. Int Endod J 2024; 57:108-113. [PMID: 37814369 DOI: 10.1111/iej.13985] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/11/2023]
Abstract
AIM Chatbot Generative Pre-trained Transformer (ChatGPT) is a generative artificial intelligence (AI) software based on large language models (LLMs), designed to simulate human conversations and generate novel content based on the training data it has been exposed to. The aim of this study was to evaluate the consistency and accuracy of ChatGPT-generated answers to clinical questions in endodontics, compared to answers provided by human experts. METHODOLOGY Ninety-one dichotomous (yes/no) questions were designed and categorized into three levels of difficulty. Twenty questions were randomly selected from each difficulty level. Sixty answers were generated by ChatGPT for each question. Two endodontic experts independently answered the 60 questions. Statistical analysis was performed using the SPSS program to calculate the consistency and accuracy of the answers generated by ChatGPT compared to the experts. Confidence intervals (95%) and standard deviations were used to estimate variability. RESULTS The answers generated by ChatGPT showed high consistency (85.44%). No significant differences in consistency were found based on question difficulty. In terms of answer accuracy, ChatGPT achieved an average accuracy of 57.33%. However, significant differences in accuracy were observed based on question difficulty, with lower accuracy for easier questions. CONCLUSIONS Currently, ChatGPT is not capable of replacing dentists in clinical decision-making. As ChatGPT's performance improves through deep learning, it is expected to become more useful and effective in the field of endodontics. However, careful attention and ongoing evaluation are needed to ensure its accuracy, reliability and safety in endodontics.
Collapse
Affiliation(s)
- Ana Suárez
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Víctor Díaz-Flores García
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Juan Algar
- Department of Clinical Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Margarita Gómez Sánchez
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - María Llorente de Pedro
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| | - Yolanda Freire
- Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain
| |
Collapse
|
221
|
Morales-Ramirez P, Mishek H, Dasgupta A. The Genie Is Out of the Bottle: What ChatGPT Can and Cannot Do for Medical Professionals. Obstet Gynecol 2024; 143:e1-e6. [PMID: 37944140 DOI: 10.1097/aog.0000000000005446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/12/2023] [Indexed: 11/12/2023]
Abstract
ChatGPT is a cutting-edge artificial intelligence technology that was released for public use in November 2022. Its rapid adoption has raised questions about capabilities, limitations, and risks. This article presents an overview of ChatGPT, and it highlights the current state of this technology for the medical field. The article seeks to provide a balanced perspective on what the model can and cannot do in three specific domains: clinical practice, research, and medical education. It also provides suggestions on how to optimize the use of this tool.
Collapse
|
222
|
Di Ieva A, Stewart C, Suero Molina E. Large Language Models in Neurosurgery. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2024; 1462:177-198. [PMID: 39523266 DOI: 10.1007/978-3-031-64892-2_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
A large language model (LLM), in the context of natural language processing and artificial intelligence, refers to a sophisticated neural network that has been trained on a massive amount of text data to understand and generate human-like language. These models are typically built on architectures like transformers. The term "large" indicates that the neural network has a significant number of parameters, making it more powerful and capable of capturing complex patterns in language. One notable example of a large language model is ChatGPT. ChatGPT is a large language model developed by OpenAI that uses deep learning techniques to generate human-like text. It can be trained on a variety of tasks, such as language translation, question answering, and text completion. One of the key features of ChatGPT is its ability to understand and respond to natural language inputs. This makes it a powerful tool for generating a wide range of text, including medical reports, surgical notes, and even poetry. Additionally, the model has been trained on a large corpus of text, which allows it to generate text that is both grammatically correct and semantically meaningful. In terms of applications in neurosurgery, ChatGPT can be used to generate detailed and accurate surgical reports, which can be very useful for sharing information about a patient's case with other members of the medical team. Additionally, the model can be used to generate detailed surgical notes, which can be very useful for training and educating residents and medical students. Overall, LLMs have the potential to be a valuable tool in the field of neurosurgery. Indeed, this abstract has been generated by ChatGPT within few seconds. Potential applications and pitfalls of the applications of LLMs are discussed in this paper.
Collapse
Affiliation(s)
- Antonio Di Ieva
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Faculty of Medicine, Human and Health Sciences, Macquarie University, Sydney, NSW, Australia.
- Macquarie Neurosurgery & Spine, MQ Health, Macquarie University Hospital, Sydney, NSW, Australia.
- Department of Neurosurgery, Nepean Blue Mountains Local Health District, Penrith, NSW, Australia.
- Centre for Applied Artificial Intelligence, School of Computing, Macquarie University, Sydney, NSW, Australia.
| | - Caleb Stewart
- Department of Neurosurgery, Louisiana State University Health Sciences Shreveport, Shreveport, LA, USA
| | - Eric Suero Molina
- Computational NeuroSurgery (CNS) Lab, Macquarie Medical School, Faculty of Medicine, Human and Health Sciences, Macquarie University, Sydney, NSW, Australia
- Department of Neurosurgery, University Hospital of Münster, Münster, Germany
| |
Collapse
|
223
|
Huang X, Estau D, Liu X, Yu Y, Qin J, Li Z. Evaluating the performance of ChatGPT in clinical pharmacy: A comparative study of ChatGPT and clinical pharmacists. Br J Clin Pharmacol 2024; 90:232-238. [PMID: 37626010 DOI: 10.1111/bcp.15896] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 08/01/2023] [Accepted: 08/14/2023] [Indexed: 08/27/2023] Open
Abstract
AIMS To evaluate the performance of chat generative pretrained transformer (ChatGPT) in key domains of clinical pharmacy practice, including prescription review, patient medication education, adverse drug reaction (ADR) recognition, ADR causality assessment and drug counselling. METHODS Questions and clinical pharmacist's answers were collected from real clinical cases and clinical pharmacist competency assessment. ChatGPT's responses were generated by inputting the same question into the 'New Chat' box of ChatGPT Mar 23 Version. Five licensed clinical pharmacists independently rated these answers on a scale of 0 (Completely incorrect) to 10 (Completely correct). The mean scores of ChatGPT and clinical pharmacists were compared using a paired 2-tailed Student's t-test. The text content of the answers was also descriptively summarized together. RESULTS The quantitative results indicated that ChatGPT was excellent in drug counselling (ChatGPT: 8.77 vs. clinical pharmacist: 9.50, P = .0791) and weak in prescription review (5.23 vs. 9.90, P = .0089), patient medication education (6.20 vs. 9.07, P = .0032), ADR recognition (5.07 vs. 9.70, P = .0483) and ADR causality assessment (4.03 vs. 9.73, P = .023). The capabilities and limitations of ChatGPT in clinical pharmacy practice were summarized based on the completeness and accuracy of the answers. ChatGPT revealed robust retrieval, information integration and dialogue capabilities. It lacked medicine-specific datasets as well as the ability for handling advanced reasoning and complex instructions. CONCLUSIONS While ChatGPT holds promise in clinical pharmacy practice as a supplementary tool, the ability of ChatGPT to handle complex problems needs further improvement and refinement.
Collapse
Affiliation(s)
- Xiaoru Huang
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
| | - Dannya Estau
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
| | - Xuening Liu
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
| | - Yang Yu
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
| | - Jiguang Qin
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
| | - Zijian Li
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
- Department of Cardiology and Institute of Vascular Medicine, Peking University Third Hospital, Beijing Key Laboratory of Cardiovascular Receptors Research, Key Laboratory of Cardiovascular Molecular Biology and Regulatory Peptides, Ministry of Health, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing, China
| |
Collapse
|
224
|
Shojaei P, Khosravi M, Jafari Y, Mahmoudi AH, Hassanipourmahani H. ChatGPT utilization within the building blocks of the healthcare services: A mixed-methods study. Digit Health 2024; 10:20552076241297059. [PMID: 39559384 PMCID: PMC11571260 DOI: 10.1177/20552076241297059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 10/17/2024] [Indexed: 11/20/2024] Open
Abstract
Introduction ChatGPT, as an AI tool, has been introduced in healthcare for various purposes. The objective of the study was to investigate the principal benefits of ChatGPT utilization in healthcare services and to identify potential domains for its expansion within the building blocks of the healthcare industry. Methods A comprehensive three-phase study was conducted employing mixed methods. The initial phase comprised a systematic review and thematic analysis of the data. In the subsequent phases, a questionnaire, developed based on the findings from the first phase, was distributed to a sample of eight experts. The objective was to prioritize the benefits and potential expansion domains of ChatGPT in healthcare building blocks, utilizing gray SWARA (Stepwise Weight Assessment Ratio Analysis) and gray MABAC (Multi-Attributive Border Approximation Area Comparison), respectively. Results The systematic review yielded 74 studies. A thematic analysis of the data from these studies identified 11 unique themes. In the second phase, employing the gray SWARA method, clinical decision-making (weight: 0.135), medical diagnosis (weight: 0.098), medical procedures (weight: 0.070), and patient-centered care (weight: 0.053) emerged as the most significant benefit of ChatGPT in the healthcare sector. Subsequently, it was determined that ChatGPT demonstrated the highest level of usefulness in the information and infrastructure, information and communication technologies blocks. Conclusion The study concluded that, despite the significant benefits of ChatGPT in the clinical domains of healthcare, it exhibits a more pronounced potential for growth within the informational domains of the healthcare industry's building blocks, rather than within the domains of intervention and clinical services.
Collapse
Affiliation(s)
- Payam Shojaei
- Department of Management, Shiraz University, Shiraz, Iran
| | - Mohsen Khosravi
- Department of Healthcare Management, School of Management and Medical Informatics, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Yalda Jafari
- Department of Management, Shiraz University, Shiraz, Iran
| | - Amir Hossein Mahmoudi
- Department of Operations Management & Decision Sciences, Faculty of Management, University of Tehran, Tehran, Iran
| | - Hadis Hassanipourmahani
- Department of Information Technology Management, Faculty of Management, University of Tehran, Tehran, Iran
| |
Collapse
|
225
|
Madrid-García A, Rosales-Rosado Z, Freites-Nuñez D, Pérez-Sancristóbal I, Pato-Cour E, Plasencia-Rodríguez C, Cabeza-Osorio L, Abasolo-Alcázar L, León-Mateos L, Fernández-Gutiérrez B, Rodríguez-Rodríguez L. Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep 2023; 13:22129. [PMID: 38092821 PMCID: PMC10719375 DOI: 10.1038/s41598-023-49483-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/08/2023] [Indexed: 12/17/2023] Open
Abstract
The emergence of large language models (LLM) with remarkable performance such as ChatGPT and GPT-4, has led to an unprecedented uptake in the population. One of their most promising and studied applications concerns education due to their ability to understand and generate human-like text, creating a multitude of opportunities for enhancing educational practices and outcomes. The objective of this study is twofold: to assess the accuracy of ChatGPT/GPT-4 in answering rheumatology questions from the access exam to specialized medical training in Spain (MIR), and to evaluate the medical reasoning followed by these LLM to answer those questions. A dataset, RheumaMIR, of 145 rheumatology-related questions, extracted from the exams held between 2010 and 2023, was created for that purpose, used as a prompt for the LLM, and was publicly distributed. Six rheumatologists with clinical and teaching experience evaluated the clinical reasoning of the chatbots using a 5-point Likert scale and their degree of agreement was analyzed. The association between variables that could influence the models' accuracy (i.e., year of the exam question, disease addressed, type of question and genre) was studied. ChatGPT demonstrated a high level of performance in both accuracy, 66.43%, and clinical reasoning, median (Q1-Q3), 4.5 (2.33-4.67). However, GPT-4 showed better performance with an accuracy score of 93.71% and a median clinical reasoning value of 4.67 (4.5-4.83). These findings suggest that LLM may serve as valuable tools in rheumatology education, aiding in exam preparation and supplementing traditional teaching methods.
Collapse
Affiliation(s)
- Alfredo Madrid-García
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain.
| | - Zulema Rosales-Rosado
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Dalifer Freites-Nuñez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Inés Pérez-Sancristóbal
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Esperanza Pato-Cour
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | | | - Luis Cabeza-Osorio
- Medicina Interna, Hospital Universitario del Henares, Avenida de Marie Curie, 0, 28822, Madrid, Spain
- Facultad de Medicina, Universidad Francisco de Vitoria, Carretera Pozuelo, Km 1800, 28223, Madrid, Spain
| | - Lydia Abasolo-Alcázar
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Leticia León-Mateos
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| | - Benjamín Fernández-Gutiérrez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
- Facultad de Medicina, Universidad Complutense de Madrid, Madrid, Spain
| | - Luis Rodríguez-Rodríguez
- Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), Prof. Martin Lagos S/N, 28040, Madrid, Spain
| |
Collapse
|
226
|
Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Qureshi F, Cheungpasitporn W. Innovating Personalized Nephrology Care: Exploring the Potential Utilization of ChatGPT. J Pers Med 2023; 13:1681. [PMID: 38138908 PMCID: PMC10744377 DOI: 10.3390/jpm13121681] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/02/2023] [Accepted: 12/02/2023] [Indexed: 12/24/2023] Open
Abstract
The rapid advancement of artificial intelligence (AI) technologies, particularly machine learning, has brought substantial progress to the field of nephrology, enabling significant improvements in the management of kidney diseases. ChatGPT, a revolutionary language model developed by OpenAI, is a versatile AI model designed to engage in meaningful and informative conversations. Its applications in healthcare have been notable, with demonstrated proficiency in various medical knowledge assessments. However, ChatGPT's performance varies across different medical subfields, posing challenges in nephrology-related queries. At present, comprehensive reviews regarding ChatGPT's potential applications in nephrology remain lacking despite the surge of interest in its role in various domains. This article seeks to fill this gap by presenting an overview of the integration of ChatGPT in nephrology. It discusses the potential benefits of ChatGPT in nephrology, encompassing dataset management, diagnostics, treatment planning, and patient communication and education, as well as medical research and education. It also explores ethical and legal concerns regarding the utilization of AI in medical practice. The continuous development of AI models like ChatGPT holds promise for the healthcare realm but also underscores the necessity of thorough evaluation and validation before implementing AI in real-world medical scenarios. This review serves as a valuable resource for nephrologists and healthcare professionals interested in fully utilizing the potential of AI in innovating personalized nephrology care.
Collapse
Affiliation(s)
- Jing Miao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Supawadee Suppadungsuk
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan 10540, Thailand
| | - Oscar A. Garcia Valencia
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Fawad Qureshi
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA; (J.M.); (C.T.); (S.S.); (O.A.G.V.); (F.Q.)
| |
Collapse
|
227
|
Hermann CE, Patel JM, Boyd L, Growdon WB, Aviki E, Stasenko M. Let's chat about cervical cancer: Assessing the accuracy of ChatGPT responses to cervical cancer questions. Gynecol Oncol 2023; 179:164-168. [PMID: 37988948 DOI: 10.1016/j.ygyno.2023.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 10/31/2023] [Accepted: 11/08/2023] [Indexed: 11/23/2023]
Abstract
OBJECTIVE To quantify the accuracy of ChatGPT in answering commonly asked questions pertaining to cervical cancer prevention, diagnosis, treatment, and survivorship/quality-of-life (QOL). METHODS ChatGPT was queried with 64 questions adapted from professional society websites and the authors' clinical experiences. The answers were scored by two attending Gynecologic Oncologists according to the following scale: 1) correct and comprehensive, 2) correct but not comprehensive, 3) some correct, some incorrect, and 4) completely incorrect. Scoring discrepancies were resolved by additional reviewers as needed. The proportion of responses earning each score were calculated overall and within each question category. RESULTS ChatGPT provided correct and comprehensive answers to 34 (53.1%) questions, correct but not comprehensive answers to 19 (29.7%) questions, partially incorrect answers to 10 (15.6%) questions, and completely incorrect answers to 1 (1.6%) question. Prevention and survivorship/QOL had the highest proportion of "correct" scores (scores of 1 or 2) at 22/24 (91.7%) and 15/16 (93.8%), respectively. ChatGPT performed less well in the treatment category, with 15/21 (71.4%) correct scores. It performed the worst in the diagnosis category with only 1/3 (33.3%) correct scores. CONCLUSION ChatGPT accurately answers questions about cervical cancer prevention, survivorship, and QOL. It performs less accurately for cervical cancer diagnosis and treatment. Further development of this immensely popular large language model should include physician input before it can be utilized as a tool for Gynecologists or recommended as a patient resource for information on cervical cancer diagnosis and treatment.
Collapse
Affiliation(s)
- Catherine E Hermann
- New York University Langone Health, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, New York, NY, United States of America.
| | - Jharna M Patel
- New York University Langone Health, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, New York, NY, United States of America
| | - Leslie Boyd
- New York University Langone Health, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, New York, NY, United States of America
| | - Whitfield B Growdon
- New York University Langone Health, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, New York, NY, United States of America
| | - Emeline Aviki
- New York University Langone Health Long Island, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, Mineola, NY, United States of America
| | - Marina Stasenko
- New York University Langone Health, Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, New York, NY, United States of America
| |
Collapse
|
228
|
Au K, Yang W. Auxiliary use of ChatGPT in surgical diagnosis and treatment. Int J Surg 2023; 109:3940-3943. [PMID: 37678271 PMCID: PMC10720849 DOI: 10.1097/js9.0000000000000686] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 08/09/2023] [Indexed: 09/09/2023]
Abstract
ChatGPT can be used as an auxiliary tool in surgical diagnosis and treatment in several ways. One of the most incredible values of using ChatGPT is its ability to quickly process and handle large amounts of data and provide relatively accurate information to healthcare workers. Due to its high accuracy and ability to process big data, ChatGPT has been widely used in the healthcare industry for tasks such as assisting medical diagnosis, giving predictions of some diseases, and analyzing some medical cases. Surgical diagnosis and treatment can serve as an auxiliary tool to help healthcare professionals. Process large amounts of medical data, provide real-time guidance and feedback, and increase healthcare's overall speed and quality. Although it has great acceptance, it still faces issues such as ethics, patient privacy, data security, law, trustworthiness, and accuracy. This study aimed to explore the auxiliary use of ChatGPT in surgical diagnosis and treatment.
Collapse
Affiliation(s)
- Kahei Au
- School of Medicine, Jinan University
| | - Wah Yang
- Department of Metabolic and Bariatric Surgery, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong Province, People’s Republic of China
| |
Collapse
|
229
|
Bagde H, Dhopte A, Alam MK, Basri R. A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research. Heliyon 2023; 9:e23050. [PMID: 38144348 PMCID: PMC10746423 DOI: 10.1016/j.heliyon.2023.e23050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 10/24/2023] [Accepted: 11/24/2023] [Indexed: 12/26/2023] Open
Abstract
Since its release, ChatGPT has taken the world by storm with its utilization in various fields of life. This review's main goal was to offer a thorough and fact-based evaluation of ChatGPT's potential as a tool for medical and dental research, which could direct subsequent research and influence clinical practices. METHODS Different online databases were scoured for relevant articles that were in accordance with the study objectives. A team of reviewers was assembled to devise a proper methodological framework for inclusion of articles and meta-analysis. RESULTS 11 descriptive studies were considered for this review that evaluated the accuracy of ChatGPT in answering medical queries related to different domains such as systematic reviews, cancer, liver diseases, diagnostic imaging, education, and COVID-19 vaccination. The studies reported different accuracy ranges, from 18.3 % to 100 %, across various datasets and specialties. The meta-analysis showed an odds ratio (OR) of 2.25 and a relative risk (RR) of 1.47 with a 95 % confidence interval (CI), indicating that the accuracy of ChatGPT in providing correct responses was significantly higher compared to the total responses for queries. However, significant heterogeneity was present among the studies, suggesting considerable variability in the effect sizes across the included studies. CONCLUSION The observations indicate that ChatGPT has the ability to provide appropriate solutions to questions in the medical and dentistry areas, but researchers and doctors should cautiously assess its responses because they might not always be dependable. Overall, the importance of this study rests in shedding light on ChatGPT's accuracy in the medical and dentistry fields and emphasizing the need for additional investigation to enhance its performance. © 2017 Elsevier Inc. All rights reserved.
Collapse
Affiliation(s)
- Hiroj Bagde
- Department of Periodontology, Chhattisgarh Dental College and Research Institute, Rajnandgaon, Chhattisgarh, India
| | - Ashwini Dhopte
- Department of Oral Medicine and Radiology, Chhattisgarh Dental College and Research Institute, Rajnandgaon, Chhattisgarh, India
| | - Mohammad Khursheed Alam
- Preventive Dentistry Department, College of Dentistry, Jouf University, Sakaka, 72345, Saudi Arabia
- Department of Dental Research Cell, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Chennai, India
- Department of Public Health, Faculty of Allied Health Sciences, Daffodil International University, Dhaka, Bangladesh
| | - Rehana Basri
- Department of Internal Medicine, College of Medicine, Jouf University, Sakaka, 72345, Saudi Arabia
| |
Collapse
|
230
|
Dang F, Samarasena JB. Generative Artificial Intelligence for Gastroenterology: Neither Friend nor Foe. Am J Gastroenterol 2023; 118:2146-2147. [PMID: 38033225 DOI: 10.14309/ajg.0000000000002573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023]
Affiliation(s)
- Frances Dang
- Division of Gastroenterology/Hepatology, University of California Irvine School of Medicine, Orange, California, USA
| | | |
Collapse
|
231
|
Caglar U, Yildiz O, Meric A, Ayranci A, Yusuf R, Sarilar O, Ozgor F. Evaluating the performance of ChatGPT in answering questions related to benign prostate hyperplasia and prostate cancer. Minerva Urol Nephrol 2023; 75:729-733. [PMID: 38126285 DOI: 10.23736/s2724-6051.23.05450-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
BACKGROUND The aim of this study was to evaluate the accuracy and reproducibility of ChatGPT's answers to frequently asked questions about benign prostate hyperplasia (BPH) and prostate cancer. METHODS Frequently asked questions on the websites of urology associations, hospitals, and social media about prostate cancer and BPH were evaluated. Also, strong recommendation-level data were noted in the recommendations tables of the European Urology Association (EAU) 2022 Guidelines on Prostate Cancer and Management of Non-neurogenic Male Lower Urinary Tract Symptoms sections. All questions were asked in order in ChatGPT Mar 23 Version. All answers were evaluated separately by two specialist urologists and scored between 1-4. RESULTS Forty questions about BPH and 86 questions about prostate cancer were included in the study. The answers to all BPH-related questions resulted in 90.0% completely correct. This rate for questions about prostate cancer was 94.2%. The completely correct rate in the questions prepared according to the strong recommendations of the EAU guideline was 77.8% for BPH and 76.2% for prostate cancer. The similarity rates of the answers to the repeated questions were 90.0% and 93% for questions related to BPH and prostate cancer, respectively. CONCLUSIONS ChatGPT has given satisfactory answers to questions about BPH and prostate cancer. Although it has limitations, it can be predicted that it will take an important place in the health sector in the future, as it is a constantly evolving platform. ChatGPT was able to provide helpful information about BPH and prostate cancer, although it is not perfect. It is constantly getting better, and may become an important resource in the healthcare field in the future.
Collapse
Affiliation(s)
- Ufuk Caglar
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Türkiye -
| | - Oguzhan Yildiz
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Türkiye
| | - Arda Meric
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Türkiye
| | - Ali Ayranci
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Türkiye
| | - Resit Yusuf
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Türkiye
| | - Omer Sarilar
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Türkiye
| | - Faruk Ozgor
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Türkiye
| |
Collapse
|
232
|
Kunze KN. Editorial Commentary: Recognizing and Avoiding Medical Misinformation Across Digital Platforms: Smoke, Mirrors (and Streaming). Arthroscopy 2023; 39:2454-2455. [PMID: 37981387 DOI: 10.1016/j.arthro.2023.06.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 06/27/2023] [Accepted: 06/30/2023] [Indexed: 11/21/2023]
Abstract
The evolution of social media and related online sources has substantially increased the ability of patients to query and access publicly available information that may have relevance to a potential musculoskeletal condition of interest. Although increased accessibility to information has several purported benefits, including encouragement of patients to become more invested in their care through self-teaching, a downside to the existence of a vast number of unregulated resources remains the risk of misinformation. As health care providers, we have a moral and ethical obligation to mitigate this risk by directing patients to high-quality resources for medical information and to be aware of resources that are unreliable. To this end, a growing body of evidence has suggested that YouTube lacks reliability and quality in terms of medical information concerning a variety of musculoskeletal conditions.
Collapse
|
233
|
Wang X, Sanders HM, Liu Y, Seang K, Tran BX, Atanasov AG, Qiu Y, Tang S, Car J, Wang YX, Wong TY, Tham YC, Chung KC. ChatGPT: promise and challenges for deployment in low- and middle-income countries. THE LANCET REGIONAL HEALTH. WESTERN PACIFIC 2023; 41:100905. [PMID: 37731897 PMCID: PMC10507635 DOI: 10.1016/j.lanwpc.2023.100905] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 08/14/2023] [Accepted: 09/03/2023] [Indexed: 09/22/2023]
Abstract
In low- and middle-income countries (LMICs), the fields of medicine and public health grapple with numerous challenges that continue to hinder patients' access to healthcare services. ChatGPT, a publicly accessible chatbot, has emerged as a potential tool in aiding public health efforts in LMICs. This viewpoint details the potential benefits of employing ChatGPT in LMICs to improve medicine and public health encompassing a broad spectrum of domains ranging from health literacy, screening, triaging, remote healthcare support, mental health support, multilingual capabilities, healthcare communication and documentation, medical training and education, and support for healthcare professionals. Additionally, we also share potential concerns and limitations associated with the use of ChatGPT and provide a balanced discussion on the opportunities and challenges of using ChatGPT in LMICs.
Collapse
Affiliation(s)
- Xiaofei Wang
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Hayley M. Sanders
- Section of Plastic Surgery, Department of Surgery, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Yuchen Liu
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Kennarey Seang
- Grant Management Office, University of Health Sciences, Phnom Penh, Cambodia
| | - Bach Xuan Tran
- Department of Health Economics, Institute for Preventive Medicine and Public Health, Hanoi Medical University, Hanoi, Vietnam
- Institute of Health Economics and Technology, Hanoi, Vietnam
| | - Atanas G. Atanasov
- Ludwig Boltzmann Institute Digital Health and Patient Safety, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria
- Institute of Genetics and Animal Biotechnology of the Polish Academy of Sciences, Jastrzebiec, 05-552, Magdalenka, Poland
| | - Yue Qiu
- Institute for Hospital Management, Tsinghua University, Beijing, China
| | - Shenglan Tang
- Duke Global Health Institute, Duke University, Durham, NC, USA
| | - Josip Car
- Centre for Population Health Sciences, Lee Kong Chian School of Medicine, Nanyang Technological University Singapore, Singapore
- Department of Primary Care and Public Health, School of Public Health, Imperial College London, London, United Kingdom
| | - Ya Xing Wang
- Beijing Institute of Ophthalmology, Beijing Ophthalmology and Visual Science Key Lab, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Tien Yin Wong
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
- Tsinghua Medicine, Tsinghua University, Beijing, China
- School of Clinical Medicine, Beijing Tsinghua Changgung Hospital, Beijing, China
| | - Yih-Chung Tham
- Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
- Centre for Innovation and Precision Eye Health, Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
- Ophthalmology and Visual Science Academic Clinical Program, Duke-NUS Medical School, Singapore
| | - Kevin C. Chung
- Section of Plastic Surgery, Department of Surgery, University of Michigan Medical School, Ann Arbor, MI, USA
| |
Collapse
|
234
|
Zhang C, Xu J, Tang R, Yang J, Wang W, Yu X, Shi S. Novel research and future prospects of artificial intelligence in cancer diagnosis and treatment. J Hematol Oncol 2023; 16:114. [PMID: 38012673 PMCID: PMC10680201 DOI: 10.1186/s13045-023-01514-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 11/20/2023] [Indexed: 11/29/2023] Open
Abstract
Research into the potential benefits of artificial intelligence for comprehending the intricate biology of cancer has grown as a result of the widespread use of deep learning and machine learning in the healthcare sector and the availability of highly specialized cancer datasets. Here, we review new artificial intelligence approaches and how they are being used in oncology. We describe how artificial intelligence might be used in the detection, prognosis, and administration of cancer treatments and introduce the use of the latest large language models such as ChatGPT in oncology clinics. We highlight artificial intelligence applications for omics data types, and we offer perspectives on how the various data types might be combined to create decision-support tools. We also evaluate the present constraints and challenges to applying artificial intelligence in precision oncology. Finally, we discuss how current challenges may be surmounted to make artificial intelligence useful in clinical settings in the future.
Collapse
Affiliation(s)
- Chaoyi Zhang
- Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China
- Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China
- Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
| | - Jin Xu
- Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China
- Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China
- Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
| | - Rong Tang
- Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China
- Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China
- Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
| | - Jianhui Yang
- Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China
- Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China
- Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
| | - Wei Wang
- Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China
- Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China
- Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China
| | - Xianjun Yu
- Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China.
- Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China.
- Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China.
| | - Si Shi
- Department of Pancreatic Surgery, Fudan University Shanghai Cancer Center, No. 270 Dong'An Road, Shanghai, 200032, People's Republic of China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, People's Republic of China.
- Shanghai Pancreatic Cancer Institute, No. 399 Lingling Road, Shanghai, 200032, People's Republic of China.
- Pancreatic Cancer Institute, Fudan University, Shanghai, 200032, People's Republic of China.
| |
Collapse
|
235
|
Song H, Xia Y, Luo Z, Liu H, Song Y, Zeng X, Li T, Zhong G, Li J, Chen M, Zhang G, Xiao B. Evaluating the Performance of Different Large Language Models on Health Consultation and Patient Education in Urolithiasis. J Med Syst 2023; 47:125. [PMID: 37999899 DOI: 10.1007/s10916-023-02021-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 11/14/2023] [Indexed: 11/25/2023]
Abstract
OBJECTIVES To evaluate the effectiveness of four large language models (LLMs) (Claude, Bard, ChatGPT4, and New Bing) that have large user bases and significant social attention, in the context of medical consultation and patient education in urolithiasis. MATERIALS AND METHODS In this study, we developed a questionnaire consisting of 21 questions and 2 clinical scenarios related to urolithiasis. Subsequently, clinical consultations were simulated for each of the four models to assess their responses to the questions. Urolithiasis experts then evaluated the model responses in terms of accuracy, comprehensiveness, ease of understanding, human care, and clinical case analysis ability based on a predesigned 5-point Likert scale. Visualization and statistical analyses were then employed to compare the four models and evaluate their performance. RESULTS All models yielded satisfying performance, except for Bard, who failed to provide a valid response to Question 13. Claude consistently scored the highest in all dimensions compared with the other three models. ChatGPT4 ranked second in accuracy, with a relatively stable output across multiple tests, but shortcomings were observed in empathy and human caring. Bard exhibited the lowest accuracy and overall performance. Claude and ChatGPT4 both had a high capacity to analyze clinical cases of urolithiasis. Overall, Claude emerged as the best performer in urolithiasis consultations and education. CONCLUSION Claude demonstrated superior performance compared with the other three in urolithiasis consultation and education. This study highlights the remarkable potential of LLMs in medical health consultations and patient education, although professional review, further evaluation, and modifications are still required.
Collapse
Affiliation(s)
- Haifeng Song
- Department of Urology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, 168 Litang Rd, Beijing, 102218, China
- Institute of Urology, School of Clinical Medicine, Tsinghua University, Beijing, 102218, China
| | - Yi Xia
- Department of Urology, Zhongda Hospital, Southeast University, 87 Dingjiaqiao, Nanjing, 210009, China
- School of Medicine, Southeast University, Nanjing, 210009, China
| | - Zhichao Luo
- Department of Urology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, 168 Litang Rd, Beijing, 102218, China
- Institute of Urology, School of Clinical Medicine, Tsinghua University, Beijing, 102218, China
| | - Hui Liu
- Department of Urology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, 168 Litang Rd, Beijing, 102218, China
- Institute of Urology, School of Clinical Medicine, Tsinghua University, Beijing, 102218, China
| | - Yan Song
- Department of Urology, Sheng Jing Hospital of China Medical University, Shenyang, 110000, China
| | - Xue Zeng
- Department of Urology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, 168 Litang Rd, Beijing, 102218, China
- Institute of Urology, School of Clinical Medicine, Tsinghua University, Beijing, 102218, China
| | - Tianjie Li
- Department of Urology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, 168 Litang Rd, Beijing, 102218, China
- Institute of Urology, School of Clinical Medicine, Tsinghua University, Beijing, 102218, China
| | - Guangxin Zhong
- Department of Urology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, 168 Litang Rd, Beijing, 102218, China
- Institute of Urology, School of Clinical Medicine, Tsinghua University, Beijing, 102218, China
| | - Jianxing Li
- Department of Urology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, 168 Litang Rd, Beijing, 102218, China
- Institute of Urology, School of Clinical Medicine, Tsinghua University, Beijing, 102218, China
| | - Ming Chen
- Department of Urology, Zhongda Hospital, Southeast University, 87 Dingjiaqiao, Nanjing, 210009, China
| | - Guangyuan Zhang
- Department of Urology, Zhongda Hospital, Southeast University, 87 Dingjiaqiao, Nanjing, 210009, China.
| | - Bo Xiao
- Department of Urology, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, 168 Litang Rd, Beijing, 102218, China.
- Institute of Urology, School of Clinical Medicine, Tsinghua University, Beijing, 102218, China.
| |
Collapse
|
236
|
Deng J, Zubair A, Park YJ. Limitations of large language models in medical applications. Postgrad Med J 2023; 99:1298-1299. [PMID: 37624143 DOI: 10.1093/postmj/qgad069] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 07/27/2023] [Accepted: 08/03/2023] [Indexed: 08/26/2023]
Affiliation(s)
- Jiawen Deng
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Areeba Zubair
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Ye-Jean Park
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
237
|
Gödde D, Nöhl S, Wolf C, Rupert Y, Rimkus L, Ehlers J, Breuckmann F, Sellmann T. A SWOT (Strengths, Weaknesses, Opportunities, and Threats) Analysis of ChatGPT in the Medical Literature: Concise Review. J Med Internet Res 2023; 25:e49368. [PMID: 37865883 PMCID: PMC10690535 DOI: 10.2196/49368] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/23/2023] Open
Abstract
BACKGROUND ChatGPT is a 175-billion-parameter natural language processing model that is already involved in scientific content and publications. Its influence ranges from providing quick access to information on medical topics, assisting in generating medical and scientific articles and papers, performing medical data analyses, and even interpreting complex data sets. OBJECTIVE The future role of ChatGPT remains uncertain and a matter of debate already shortly after its release. This review aimed to analyze the role of ChatGPT in the medical literature during the first 3 months after its release. METHODS We performed a concise review of literature published in PubMed from December 1, 2022, to March 31, 2023. To find all publications related to ChatGPT or considering ChatGPT, the search term was kept simple ("ChatGPT" in AllFields). All publications available as full text in German or English were included. All accessible publications were evaluated according to specifications by the author team (eg, impact factor, publication modus, article type, publication speed, and type of ChatGPT integration or content). The conclusions of the articles were used for later SWOT (strengths, weaknesses, opportunities, and threats) analysis. All data were analyzed on a descriptive basis. RESULTS Of 178 studies in total, 160 met the inclusion criteria and were evaluated. The average impact factor was 4.423 (range 0-96.216), and the average publication speed was 16 (range 0-83) days. Among the articles, there were 77 editorials (48,1%), 43 essays (26.9%), 21 studies (13.1%), 6 reviews (3.8%), 6 case reports (3.8%), 6 news (3.8%), and 1 meta-analysis (0.6%). Of those, 54.4% (n=87) were published as open access, with 5% (n=8) provided on preprint servers. Over 400 quotes with information on strengths, weaknesses, opportunities, and threats were detected. By far, most (n=142, 34.8%) were related to weaknesses. ChatGPT excels in its ability to express ideas clearly and formulate general contexts comprehensibly. It performs so well that even experts in the field have difficulty identifying abstracts generated by ChatGPT. However, the time-limited scope and the need for corrections by experts were mentioned as weaknesses and threats of ChatGPT. Opportunities include assistance in formulating medical issues for nonnative English speakers, as well as the possibility of timely participation in the development of such artificial intelligence tools since it is in its early stages and can therefore still be influenced. CONCLUSIONS Artificial intelligence tools such as ChatGPT are already part of the medical publishing landscape. Despite their apparent opportunities, policies and guidelines must be implemented to ensure benefits in education, clinical practice, and research and protect against threats such as scientific misconduct, plagiarism, and inaccuracy.
Collapse
Affiliation(s)
- Daniel Gödde
- Department of Pathology and Molecularpathology, Helios University Hospital Wuppertal, Witten/Herdecke University, Witten, Germany
| | - Sophia Nöhl
- Faculty of Health, Witten/Herdecke University, Witten, Germany
| | - Carina Wolf
- Faculty of Health, Witten/Herdecke University, Witten, Germany
| | - Yannick Rupert
- Faculty of Health, Witten/Herdecke University, Witten, Germany
| | - Lukas Rimkus
- Faculty of Health, Witten/Herdecke University, Witten, Germany
| | - Jan Ehlers
- Department of Didactics and Education Research in the Health Sector, Faculty of Health, Witten/Herdecke University, Witten, Germany
| | - Frank Breuckmann
- Department of Cardiology and Vascular Medicine, West German Heart and Vascular Center Essen, University Duisburg-Essen, Essen, Germany
- Department of Cardiology, Pneumology, Neurology and Intensive Care Medicine, Klinik Kitzinger Land, Kitzingen, Germany
| | - Timur Sellmann
- Department of Anaesthesiology I, Witten/Herdecke University, Witten, Germany
- Department of Anaesthesiology and Intensive Care Medicine, Evangelisches Krankenhaus BETHESDA zu Duisburg, Duisburg, Germany
| |
Collapse
|
238
|
Lakdawala N, Channa L, Gronbeck C, Lakdawala N, Weston G, Sloan B, Feng H. Assessing the Accuracy and Comprehensiveness of ChatGPT in Offering Clinical Guidance for Atopic Dermatitis and Acne Vulgaris. JMIR DERMATOLOGY 2023; 6:e50409. [PMID: 37962920 PMCID: PMC10685272 DOI: 10.2196/50409] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 09/13/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Affiliation(s)
- Nehal Lakdawala
- University of Connecticut School of Medicine, Farmington, CT, United States
| | | | - Christian Gronbeck
- Department of Dermatology, University of Connecticut Health Center, Farmington, CT, United States
| | - Nikita Lakdawala
- The Ronald O. Perelman Department of Dermatology, New York University, New York, NY, United States
| | - Gillian Weston
- Department of Dermatology, University of Connecticut Health Center, Farmington, CT, United States
| | - Brett Sloan
- Department of Dermatology, University of Connecticut Health Center, Farmington, CT, United States
| | - Hao Feng
- Department of Dermatology, University of Connecticut Health Center, Farmington, CT, United States
| |
Collapse
|
239
|
Ge J, Sun S, Owens J, Galvez V, Gologorskaya O, Lai JC, Pletcher MJ, Lai K. Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.10.23298364. [PMID: 37986764 PMCID: PMC10659484 DOI: 10.1101/2023.11.10.23298364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Background Large language models (LLMs) have significant capabilities in clinical information processing tasks. Commercially available LLMs, however, are not optimized for clinical uses and are prone to generating incorrect or hallucinatory information. Retrieval-augmented generation (RAG) is an enterprise architecture that allows embedding of customized data into LLMs. This approach "specializes" the LLMs and is thought to reduce hallucinations. Methods We developed "LiVersa," a liver disease-specific LLM, by using our institution's protected health information (PHI)-complaint text embedding and LLM platform, "Versa." We conducted RAG on 30 publicly available American Association for the Study of Liver Diseases (AASLD) guidelines and guidance documents to be incorporated into LiVersa. We evaluated LiVersa's performance by comparing its responses versus those of trainees from a previously published knowledge assessment study regarding hepatitis B (HBV) treatment and hepatocellular carcinoma (HCC) surveillance. Results LiVersa answered all 10 questions correctly when forced to provide a "yes" or "no" answer. Full detailed responses with justifications and rationales, however, were not completely correct for three of the questions. Discussions In this study, we demonstrated the ability to build disease-specific and PHI-compliant LLMs using RAG. While our LLM, LiVersa, demonstrated more specificity in answering questions related to clinical hepatology - there were some knowledge deficiencies due to limitations set by the number and types of documents used for RAG. The LiVersa prototype, however, is a proof of concept for utilizing RAG to customize LLMs for clinical uses and a potential strategy to realize personalized medicine in the future.
Collapse
Affiliation(s)
- Jin Ge
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California – San Francisco, San Francisco, CA
| | - Steve Sun
- UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
| | - Joseph Owens
- UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
| | - Victor Galvez
- UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
| | - Oksana Gologorskaya
- UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California – San Francisco, San Francisco, CA
| | - Jennifer C. Lai
- Division of Gastroenterology and Hepatology, Department of Medicine, University of California – San Francisco, San Francisco, CA
| | - Mark J. Pletcher
- Department of Epidemiology and Biostatistics, University of California – San Francisco, San Francisco, CA
| | - Ki Lai
- UCSF Health Information Technology, University of California – San Francisco, San Francisco, CA
| |
Collapse
|
240
|
Chlorogiannis DD, Apostolos A, Chlorogiannis A, Palaiodimos L, Giannakoulas G, Pargaonkar S, Xesfingi S, Kokkinidis DG. The Role of ChatGPT in the Advancement of Diagnosis, Management, and Prognosis of Cardiovascular and Cerebrovascular Disease. Healthcare (Basel) 2023; 11:2906. [PMID: 37958050 PMCID: PMC10648908 DOI: 10.3390/healthcare11212906] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 10/24/2023] [Accepted: 11/04/2023] [Indexed: 11/15/2023] Open
Abstract
Cardiovascular and cerebrovascular disease incidence has risen mainly due to poor control of preventable risk factors and still constitutes a significant financial and health burden worldwide. ChatGPT is an artificial intelligence language-based model developed by OpenAI. Due to the model's unique cognitive capabilities beyond data processing and the production of high-quality text, there has been a surge of research interest concerning its role in the scientific community and contemporary clinical practice. To fully exploit ChatGPT's potential benefits and reduce its possible misuse, extreme caution must be taken to ensure its implications ethically and equitably. In this narrative review, we explore the language model's possible applications and limitations while emphasizing its potential value for diagnosing, managing, and prognosis of cardiovascular and cerebrovascular disease.
Collapse
Affiliation(s)
| | - Anastasios Apostolos
- First Department of Cardiology, School of Medicine, National Kapodistrian University of Athens, Hippokrateion General Hospital of Athens, 115 27 Athens, Greece;
| | - Anargyros Chlorogiannis
- Department of Health Economics, Policy and Management, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Leonidas Palaiodimos
- Division of Hospital Medicine, Jacobi Medical Center, NYC H+H, Albert Einstein College of Medicine, New York, NY 10461, USA; (L.P.); (S.P.)
| | - George Giannakoulas
- Department of Cardiology, AHEPA University Hospital, Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece;
| | - Sumant Pargaonkar
- Division of Hospital Medicine, Jacobi Medical Center, NYC H+H, Albert Einstein College of Medicine, New York, NY 10461, USA; (L.P.); (S.P.)
| | - Sofia Xesfingi
- Department of Economics, University of Piraeus, 185 34 Piraeus, Greece
| | - Damianos G. Kokkinidis
- Section of Cardiovascular Medicine, Yale University School of Medicine, New Haven, CT 06510, USA
| |
Collapse
|
241
|
Yang HS, Wang F, Greenblatt MB, Huang SX, Zhang Y. AI Chatbots in Clinical Laboratory Medicine: Foundations and Trends. Clin Chem 2023; 69:1238-1246. [PMID: 37664912 DOI: 10.1093/clinchem/hvad106] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 06/05/2023] [Indexed: 09/05/2023]
Abstract
BACKGROUND Artificial intelligence (AI) conversational agents, or chatbots, are computer programs designed to simulate human conversations using natural language processing. They offer diverse functions and applications across an expanding range of healthcare domains. However, their roles in laboratory medicine remain unclear, as their accuracy, repeatability, and ability to interpret complex laboratory data have yet to be rigorously evaluated. CONTENT This review provides an overview of the history of chatbots, two major chatbot development approaches, and their respective advantages and limitations. We discuss the capabilities and potential applications of chatbots in healthcare, focusing on the laboratory medicine field. Recent evaluations of chatbot performance are presented, with a special emphasis on large language models such as the Chat Generative Pre-trained Transformer in response to laboratory medicine questions across different categories, such as medical knowledge, laboratory operations, regulations, and interpretation of laboratory results as related to clinical context. We analyze the causes of chatbots' limitations and suggest research directions for developing more accurate, reliable, and manageable chatbots for applications in laboratory medicine. SUMMARY Chatbots, which are rapidly evolving AI applications, hold tremendous potential to improve medical education, provide timely responses to clinical inquiries concerning laboratory tests, assist in interpreting laboratory results, and facilitate communication among patients, physicians, and laboratorians. Nevertheless, users should be vigilant of existing chatbots' limitations, such as misinformation, inconsistencies, and lack of human-like reasoning abilities. To be effectively used in laboratory medicine, chatbots must undergo extensive training on rigorously validated medical knowledge and be thoroughly evaluated against standard clinical practice.
Collapse
Affiliation(s)
- He S Yang
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States
| | - Matthew B Greenblatt
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, United States
- Research Division, Hospital for Special Surgery, New York, NY, United States
| | - Sharon X Huang
- College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA, United States
| | - Yi Zhang
- Department of Computer Science and Engineering, University of California, Santa Cruz, Santa Cruz, CA, United States
| |
Collapse
|
242
|
Kaarre J, Feldt R, Keeling LE, Dadoo S, Zsidai B, Hughes JD, Samuelsson K, Musahl V. Exploring the potential of ChatGPT as a supplementary tool for providing orthopaedic information. Knee Surg Sports Traumatol Arthrosc 2023; 31:5190-5198. [PMID: 37553552 PMCID: PMC10598178 DOI: 10.1007/s00167-023-07529-2] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 07/26/2023] [Indexed: 08/10/2023]
Abstract
PURPOSE To investigate the potential use of large language models (LLMs) in orthopaedics by presenting queries pertinent to anterior cruciate ligament (ACL) surgery to generative pre-trained transformer (ChatGPT, specifically using its GPT-4 model of March 14th 2023). Additionally, this study aimed to evaluate the depth of the LLM's knowledge and investigate its adaptability to different user groups. It was hypothesized that the ChatGPT would be able to adapt to different target groups due to its strong language understanding and processing capabilities. METHODS ChatGPT was presented with 20 questions and response was requested for two distinct target audiences: patients and non-orthopaedic medical doctors. Two board-certified orthopaedic sports medicine surgeons and two expert orthopaedic sports medicine surgeons independently evaluated the responses generated by ChatGPT. Mean correctness, completeness, and adaptability to the target audiences (patients and non-orthopaedic medical doctors) were determined. A three-point response scale facilitated nuanced assessment. RESULTS ChatGPT exhibited fair accuracy, with average correctness scores of 1.69 and 1.66 (on a scale from 0, incorrect, 1, partially correct, to 2, correct) for patients and medical doctors, respectively. Three of the 20 questions (15.0%) were deemed incorrect by any of the four orthopaedic sports medicine surgeon assessors. Moreover, overall completeness was calculated to be 1.51 and 1.64 for patients and medical doctors, respectively, while overall adaptiveness was determined to be 1.75 and 1.73 for patients and doctors, respectively. CONCLUSION Overall, ChatGPT was successful in generating correct responses in approximately 65% of the cases related to ACL surgery. The findings of this study imply that LLMs offer potential as a supplementary tool for acquiring orthopaedic knowledge. However, although ChatGPT can provide guidance and effectively adapt to diverse target audiences, it cannot supplant the expertise of orthopaedic sports medicine surgeons in diagnostic and treatment planning endeavours due to its limited understanding of orthopaedic domains and its potential for erroneous responses. LEVEL OF EVIDENCE V.
Collapse
Affiliation(s)
- Janina Kaarre
- Department of Orthopaedic Surgery, UPMC Freddie Fu Sports Medicine Center, University of Pittsburgh, Pittsburgh, USA
- Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Göteborgsvägen 31, 431 80 Mölndal, Sweden
| | - Robert Feldt
- Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Laura E. Keeling
- Department of Orthopaedic Surgery, UPMC Freddie Fu Sports Medicine Center, University of Pittsburgh, Pittsburgh, USA
| | - Sahil Dadoo
- Department of Orthopaedic Surgery, UPMC Freddie Fu Sports Medicine Center, University of Pittsburgh, Pittsburgh, USA
| | - Bálint Zsidai
- Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Göteborgsvägen 31, 431 80 Mölndal, Sweden
| | - Jonathan D. Hughes
- Department of Orthopaedic Surgery, UPMC Freddie Fu Sports Medicine Center, University of Pittsburgh, Pittsburgh, USA
| | - Kristian Samuelsson
- Department of Orthopaedics, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Göteborgsvägen 31, 431 80 Mölndal, Sweden
- Department of Orthopaedics, Sahlgrenska University Hospital, Mölndal, Sweden
| | - Volker Musahl
- Department of Orthopaedic Surgery, UPMC Freddie Fu Sports Medicine Center, University of Pittsburgh, Pittsburgh, USA
| |
Collapse
|
243
|
Daher M, Koa J, Boufadel P, Singh J, Fares MY, Abboud JA. Breaking barriers: can ChatGPT compete with a shoulder and elbow specialist in diagnosis and management? JSES Int 2023; 7:2534-2541. [PMID: 37969495 PMCID: PMC10638599 DOI: 10.1016/j.jseint.2023.07.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2023] Open
Abstract
Background ChatGPT is an artificial intelligence (AI) language processing model that uses deep learning to generate human-like responses to natural language inputs. Its potential use in health care has raised questions and several studies have assessed its effectiveness in writing articles, clinical reasoning, and solving complex questions. This study aims to investigate ChatGPT's capabilities and implications in diagnosing and managing patients with new shoulder and elbow complaints in a private clinical setting to provide insights into its potential use as a diagnostic tool for patients and a first consultation resource for primary physicians. Methods In a private clinical setting, patients were assessed by ChatGPT after being seen by a shoulder and elbow specialist for shoulder and elbow symptoms. To be assessed by the AI model, a research fellow filled out a standardized form (including age, gender, major comorbidities, symptoms and the localization, natural history, and duration, any associated symptoms or movement deficit, aggravating/relieving factors, and x-ray/imaging report if present). This form was submitted through the ChatGPT portal and the AI model was asked for a diagnosis and best management modality. Results A total of 29 patients with 15 males and 14 females, were included in this study. The AI model was able to correctly choose the diagnosis and management in 93% (27/29) and 83% (24/29) of the patients, respectively. Furthermore, of the remaining 24 patients that were managed correctly, ChatGPT did not specify the appropriate management in 6 patients and chose only one management in 5 patients, where both were applicable and dependent on the patient's choice. Therefore, 55% of ChatGPT's management was poor. Conclusion ChatGPT made a worthy opponent; however, it will not be able to replace in its current form a shoulder and elbow specialist in diagnosing and treating patients for many reasons such as misdiagnosis, poor management, lack of empathy and interactions with patients, its dependence on magnetic resonance imaging reports, and its lack of new knowledge.
Collapse
Affiliation(s)
| | - Jonathan Koa
- Rothman Institute/Thomas Jefferson Medical Center, Philadelphia, PA, USA
| | - Peter Boufadel
- Rothman Institute/Thomas Jefferson Medical Center, Philadelphia, PA, USA
| | - Jaspal Singh
- Rothman Institute/Thomas Jefferson Medical Center, Philadelphia, PA, USA
| | - Mohamad Y. Fares
- Rothman Institute/Thomas Jefferson Medical Center, Philadelphia, PA, USA
| | - Joseph A. Abboud
- Rothman Institute/Thomas Jefferson Medical Center, Philadelphia, PA, USA
| |
Collapse
|
244
|
Caglar U, Yildiz O, Ozervarli MF, Aydin R, Sarilar O, Ozgor F, Ortac M. Assessing the Performance of Chat Generative Pretrained Transformer (ChatGPT) in Answering Andrology-Related Questions. UROLOGY RESEARCH & PRACTICE 2023; 49:365-369. [PMID: 37933835 PMCID: PMC10765186 DOI: 10.5152/tud.2023.23171] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 10/07/2023] [Indexed: 11/08/2023]
Abstract
OBJECTIVE The internet and social media have become primary sources of health information, with men frequently turning to these platforms before seeking professional help. Chat generative pretrained transformer (ChatGPT), an artificial intelligence model developed by OpenAI, has gained popularity as a natural language processing program. The present study evaluated the accuracy and reproducibility of ChatGPT's responses to andrology-related questions. METHODS The study analyzed frequently asked andrology questions from health forums, hospital websites, and social media platforms like YouTube and Instagram. Questions were categorized into topics like male hypogonadism, erectile dysfunction, etc. The European Association of Urology (EAU) guideline recommendations were also included. These questions were input into ChatGPT, and responses were evaluated by 3 experienced urologists who scored them on a scale of 1 to 4. RESULTS Out of 136 evaluated questions, 108 met the criteria. Of these, 87.9% received correct and adequate answers, 9.3% were correct but insufficient, and 3 responses contained both correct and incorrect information. No question was answered completely wrong. The highest correct answer rates were for disorders of ejaculation, penile curvature, and male hypogonadism. The EAU guideline-based questions achieved a correctness rate of 86.3%. The reproducibility of the answers was over 90%. CONCLUSION The study found that ChatGPT provided accurate and reliable answers to over 80% of andrology-related questions. While limitations exist, such as potential outdated data and inability to understand emotional aspects, ChatGPT's potential in the health-care sector is promising. Collaborating with health-care professionals during artificial intelligence model development could enhance its reliability.
Collapse
Affiliation(s)
- Ufuk Caglar
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Oguzhan Yildiz
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - M Fırat Ozervarli
- Department of Urology, Istanbul University, Istanbul School of Medicine, Istanbul, Turkey
| | - Resat Aydin
- Department of Urology, Istanbul University, Istanbul School of Medicine, Istanbul, Turkey
| | - Omer Sarilar
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Faruk Ozgor
- Department of Urology, Haseki Training and Research Hospital, Istanbul, Turkey
| | - Mazhar Ortac
- Department of Urology, Istanbul University, Istanbul School of Medicine, Istanbul, Turkey
| |
Collapse
|
245
|
Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of Chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell 2023; 6:1237704. [PMID: 38028668 PMCID: PMC10644239 DOI: 10.3389/frai.2023.1237704] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/05/2023] [Indexed: 12/01/2023] Open
Abstract
The release of ChatGPT has initiated new thinking about AI-based Chatbot and its application and has drawn huge public attention worldwide. Researchers and doctors have started thinking about the promise and application of AI-related large language models in medicine during the past few months. Here, the comprehensive review highlighted the overview of Chatbot and ChatGPT and their current role in medicine. Firstly, the general idea of Chatbots, their evolution, architecture, and medical use are discussed. Secondly, ChatGPT is discussed with special emphasis of its application in medicine, architecture and training methods, medical diagnosis and treatment, research ethical issues, and a comparison of ChatGPT with other NLP models are illustrated. The article also discussed the limitations and prospects of ChatGPT. In the future, these large language models and ChatGPT will have immense promise in healthcare. However, more research is needed in this direction.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | | | - Snehasish Dash
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging and Orthopedic Surgery, Hallym University Chuncheon Sacred Heart Hospital, Chuncheon-si, Gangwon-do, Republic of Korea
| |
Collapse
|
246
|
Jin Y, Liu H, Zhao B, Pan W. ChatGPT and mycosis- a new weapon in the knowledge battlefield. BMC Infect Dis 2023; 23:731. [PMID: 37891532 PMCID: PMC10605453 DOI: 10.1186/s12879-023-08724-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 10/17/2023] [Indexed: 10/29/2023] Open
Abstract
As current trend for physician tools, ChatGPT can sift through massive amounts of information and solve problems through easy-to-understand conversations, ultimately improving efficiency. Mycosis is currently facing great challenges, including high fungal burdens, high mortality, limited choice of antifungal drugs and increasing drug resistance. To address these challenges, We asked ChatGPT for fungal infection scenario-based questions and assessed its appropriateness, consistency, and potential pitfalls. We concluded ChatGPT can provide compelling responses to most prompts, including diagnosis, recommendations for examination, treatment and rational drug use. Moreover, we summarized exciting future applications in mycosis, such as clinical work, scientific research, education and healthcare. However, the largest barriers to implementation are deficits in indiviudal advice, timely literature updates, consistency, accuracy and data safety. To fully embrace the opportunity, we need to address these barriers and manage the risks. We expect that ChatGPT will become a new weapon in in the battlefield of mycosis.
Collapse
Affiliation(s)
- Yi Jin
- Department of Dermatology, Shanghai Key Laboratory of Medical Mycology, Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, P.R. China
| | - Hua Liu
- Department of Anesthesiology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Bin Zhao
- Department of Anesthesiology and SICU, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200092, P.R. China.
| | - Weihua Pan
- Department of Dermatology, Shanghai Key Laboratory of Medical Mycology, Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, P.R. China.
| |
Collapse
|
247
|
Laoveeravat P. AI (Artificial Intelligence) as an IA (Intelligent Assistant): ChatGPT for Surveillance Colonoscopy Questions. GASTRO HEP ADVANCES 2023; 2:1138-1139. [PMID: 39131555 PMCID: PMC11308129 DOI: 10.1016/j.gastha.2023.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 10/20/2023] [Indexed: 08/13/2024]
Affiliation(s)
- Passisd Laoveeravat
- Division of Digestive Disease and Nutrition, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky
| |
Collapse
|
248
|
Momenaei B, Wakabayashi T, Shahlaee A, Durrani AF, Pandit SA, Wang K, Mansour HA, Abishek RM, Xu D, Sridhar J, Yonekawa Y, Kuriyan AE. Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases. Ophthalmol Retina 2023; 7:862-868. [PMID: 37277096 DOI: 10.1016/j.oret.2023.05.022] [Citation(s) in RCA: 97] [Impact Index Per Article: 48.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/26/2023] [Accepted: 05/30/2023] [Indexed: 06/07/2023]
Abstract
OBJECTIVE To evaluate the appropriateness and readability of the medical knowledge provided by ChatGPT-4, an artificial intelligence-powered conversational search engine, regarding common vitreoretinal surgeries for retinal detachments (RDs), macular holes (MHs), and epiretinal membranes (ERMs). DESIGN Retrospective cross-sectional study. SUBJECTS This study did not involve any human participants. METHODS We created lists of common questions about the definition, prevalence, visual impact, diagnostic methods, surgical and nonsurgical treatment options, postoperative information, surgery-related complications, and visual prognosis of RD, MH, and ERM, and asked each question 3 times on the online ChatGPT-4 platform. The data for this cross-sectional study were recorded on April 25, 2023. Two independent retina specialists graded the appropriateness of the responses. Readability was assessed using Readable, an online readability tool. MAIN OUTCOME MEASURES The "appropriateness" and "readability" of the answers generated by ChatGPT-4 bot. RESULTS Responses were consistently appropriate in 84.6% (33/39), 92% (23/25), and 91.7% (22/24) of the questions related to RD, MH, and ERM, respectively. Answers were inappropriate at least once in 5.1% (2/39), 8% (2/25), and 8.3% (2/24) of the respective questions. The average Flesch Kincaid Grade Level and Flesch Reading Ease Score were 14.1 ± 2.6 and 32.3 ± 10.8 for RD, 14 ± 1.3 and 34.4 ± 7.7 for MH, and 14.8 ± 1.3 and 28.1 ± 7.5 for ERM. These scores indicate that the answers are difficult or very difficult to read for the average lay person and college graduation would be required to understand the material. CONCLUSIONS Most of the answers provided by ChatGPT-4 were consistently appropriate. However, ChatGPT and other natural language models in their current form are not a source of factual information. Improving the credibility and readability of responses, especially in specialized fields, such as medicine, is a critical focus of research. Patients, physicians, and laypersons should be advised of the limitations of these tools for eye- and health-related counseling. FINANCIAL DISCLOSURE(S) Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Bita Momenaei
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Taku Wakabayashi
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Abtin Shahlaee
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Asad F Durrani
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Saagar A Pandit
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Kristine Wang
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Hana A Mansour
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Robert M Abishek
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - David Xu
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Jayanth Sridhar
- Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, Florida
| | - Yoshihiro Yonekawa
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Ajay E Kuriyan
- Wills Eye Hospital, Mid Atlantic Retina, Thomas Jefferson University, Philadelphia, Pennsylvania.
| |
Collapse
|
249
|
Evaluating the Efficacy of ChatGPT as a Valuable Resource for Pharmacology Studies in Traditional and Complementary Medicine (T&CM) Education. ARTIFICIAL INTELLIGENCE APPLICATIONS USING CHATGPT IN EDUCATION 2023:1-17. [DOI: 10.4018/978-1-6684-9300-7.ch001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Artificial intelligence (AI) is gaining increasing prominence in the field of education, yet comprehensive investigations into its underlying patterns, research limitations, and potential applications remain scarce. ChatGPT, an AI-powered platform developed by the AI research and deployment company OpenAI, allows users to input text instructions and receive prompt textual responses based on its machine learning-driven interactions with online information sources. This study aims to assess the efficacy of ChatGPT in addressing student-centered medical inquiries pertaining to pharmacology, thereby examining its relevance as a self-study resource to enhance the learning experiences of students. Specifically, the study encompasses various domains of pharmacology, such as pharmacokinetics, mechanism of action, clinical uses, adverse effects, contraindications, and drug-drug interactions. The findings demonstrate that ChatGPT provides pertinent and accurate answers to these questions.
Collapse
|
250
|
Talyshinskii A, Naik N, Hameed BMZ, Zhanbyrbekuly U, Khairli G, Guliev B, Juilebø-Jones P, Tzelves L, Somani BK. Expanding horizons and navigating challenges for enhanced clinical workflows: ChatGPT in urology. Front Surg 2023; 10:1257191. [PMID: 37744723 PMCID: PMC10512827 DOI: 10.3389/fsurg.2023.1257191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Purpose of review ChatGPT has emerged as a potential tool for facilitating doctors' workflows. However, when it comes to applying these findings within a urological context, there have not been many studies. Thus, our objective was rooted in analyzing the pros and cons of ChatGPT use and how it can be exploited and used by urologists. Recent findings ChatGPT can facilitate clinical documentation and note-taking, patient communication and support, medical education, and research. In urology, it was proven that ChatGPT has the potential as a virtual healthcare aide for benign prostatic hyperplasia, an educational and prevention tool on prostate cancer, educational support for urological residents, and as an assistant in writing urological papers and academic work. However, several concerns about its exploitation are presented, such as lack of web crawling, risk of accidental plagiarism, and concerns about patients-data privacy. Summary The existing limitations mediate the need for further improvement of ChatGPT, such as ensuring the privacy of patient data and expanding the learning dataset to include medical databases, and developing guidance on its appropriate use. Urologists can also help by conducting studies to determine the effectiveness of ChatGPT in urology in clinical scenarios and nosologies other than those previously listed.
Collapse
Affiliation(s)
- Ali Talyshinskii
- Department of Urology, Astana Medical University, Astana, Kazakhstan
| | - Nithesh Naik
- Department of Mechanical and Industrial Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | | | | | - Gafur Khairli
- Department of Urology, Astana Medical University, Astana, Kazakhstan
| | - Bakhman Guliev
- Department of Urology, Mariinsky Hospital, St Petersburg, Russia
| | | | - Lazaros Tzelves
- Department of Urology, National and Kapodistrian University of Athens, Sismanogleion Hospital, Athens, Marousi, Greece
| | - Bhaskar Kumar Somani
- Department of Urology, University Hospital Southampton NHS Trust, Southampton, United Kingdom
| |
Collapse
|