1
|
Iglesias G, Talavera E, Troya J, Díaz-Álvarez A, García-Remesal M. Artificial intelligence model for tumoral clinical decision support systems. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 253:108228. [PMID: 38810378 DOI: 10.1016/j.cmpb.2024.108228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 04/21/2024] [Accepted: 05/13/2024] [Indexed: 05/31/2024]
Abstract
BACKGROUND AND OBJECTIVE Comparative diagnostic in brain tumor evaluation makes possible to use the available information of a medical center to compare similar cases when a new patient is evaluated. By leveraging Artificial Intelligence models, the proposed system is able of retrieving the most similar cases of brain tumors for a given query. The primary objective is to enhance the diagnostic process by generating more accurate representations of medical images, with a particular focus on patient-specific normal features and pathologies. A key distinction from previous models lies in its ability to produce enriched image descriptors solely from binary information, eliminating the need for costly and difficult to obtain tumor segmentation. METHODS The proposed model uses Artificial Intelligence to detect patient features to recommend the most similar cases from a database. The system not only suggests similar cases but also balances the representation of healthy and abnormal features in its design. This not only encourages the generalization of its use but also aids clinicians in their decision-making processes. This generalization makes possible for future research in different medical diagnosis areas with almost not any change in the system. RESULTS We conducted a comparative analysis of our approach in relation to similar studies. The proposed architecture obtains a Dice coefficient of 0.474 in both tumoral and healthy regions of the patients, which outperforms previous literature. Our proposed model excels at extracting and combining anatomical and pathological features from brain Magnetic Resonances (MRs), achieving state-of-the-art results while relying on less expensive label information. This substantially reduces the overall cost of the training process. Our findings highlight the significant potential for improving the efficiency and accuracy of comparative diagnostics and the treatment of tumoral pathologies. CONCLUSIONS This paper provides substantial grounds for further exploration of the broader applicability and optimization of the proposed architecture to enhance clinical decision-making. The novel approach presented in this work marks a significant advancement in the field of medical diagnosis, particularly in the context of Artificial Intelligence-assisted image retrieval, and promises to reduce costs and improve the quality of patient care using Artificial Intelligence as a support tool instead of a black box system.
Collapse
Affiliation(s)
- Guillermo Iglesias
- Departamento de Sistemas Informáticos, Escuela Técnica Superior de Ingeniería de Sistemas Informáticos, Universidad Politécnica de Madrid, Spain.
| | - Edgar Talavera
- Departamento de Sistemas Informáticos, Escuela Técnica Superior de Ingeniería de Sistemas Informáticos, Universidad Politécnica de Madrid, Spain.
| | - Jesús Troya
- Infanta Leonor University Hospital. Madrid., Spain
| | - Alberto Díaz-Álvarez
- Departamento de Sistemas Informáticos, Escuela Técnica Superior de Ingeniería de Sistemas Informáticos, Universidad Politécnica de Madrid, Spain.
| | - Miguel García-Remesal
- Biomedical Informatics Group, Departamento de Inteligencia Artificial, Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Spain.
| |
Collapse
|
2
|
Hassanipour S, Nayak S, Bozorgi A, Keivanlou MH, Dave T, Alotaibi A, Joukar F, Mellatdoust P, Bakhshi A, Kuriyakose D, Polisetty LD, Chimpiri M, Amini-Salehi E. The Ability of ChatGPT in Paraphrasing Texts and Reducing Plagiarism: A Descriptive Analysis. JMIR MEDICAL EDUCATION 2024; 10:e53308. [PMID: 38989841 DOI: 10.2196/53308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 01/03/2024] [Accepted: 05/01/2024] [Indexed: 07/12/2024]
Abstract
Background The introduction of ChatGPT by OpenAI has garnered significant attention. Among its capabilities, paraphrasing stands out. Objective This study aims to investigate the satisfactory levels of plagiarism in the paraphrased text produced by this chatbot. Methods Three texts of varying lengths were presented to ChatGPT. ChatGPT was then instructed to paraphrase the provided texts using five different prompts. In the subsequent stage of the study, the texts were divided into separate paragraphs, and ChatGPT was requested to paraphrase each paragraph individually. Lastly, in the third stage, ChatGPT was asked to paraphrase the texts it had previously generated. Results The average plagiarism rate in the texts generated by ChatGPT was 45% (SD 10%). ChatGPT exhibited a substantial reduction in plagiarism for the provided texts (mean difference -0.51, 95% CI -0.54 to -0.48; P<.001). Furthermore, when comparing the second attempt with the initial attempt, a significant decrease in the plagiarism rate was observed (mean difference -0.06, 95% CI -0.08 to -0.03; P<.001). The number of paragraphs in the texts demonstrated a noteworthy association with the percentage of plagiarism, with texts consisting of a single paragraph exhibiting the lowest plagiarism rate (P<.001). Conclusions Although ChatGPT demonstrates a notable reduction of plagiarism within texts, the existing levels of plagiarism remain relatively high. This underscores a crucial caution for researchers when incorporating this chatbot into their work.
Collapse
Affiliation(s)
- Soheil Hassanipour
- Gastrointestinal and Liver Diseases Research Center, Guilan University of Medical Sciences, Rasht, Iran
| | - Sandeep Nayak
- Department of Internal Medicine, Yale New Haven Health Bridgeport Hospital, Bridgeport, NY, United States
| | - Ali Bozorgi
- Tehran Heart Center, Tehran University of Medical Sciences, Tehran, Iran
| | | | - Tirth Dave
- Department of Internal Medicine, Bukovinian State Medical University, Chernivtsi, Ukraine
| | | | - Farahnaz Joukar
- Gastrointestinal and Liver Diseases Research Center, Guilan University of Medical Sciences, Rasht, Iran
| | - Parinaz Mellatdoust
- Dipartimento di Elettronica Informazione Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Arash Bakhshi
- Gastrointestinal and Liver Diseases Research Center, Guilan University of Medical Sciences, Rasht, Iran
| | - Dona Kuriyakose
- Department of Internal Medicine, St. Joseph's Mission Hospital, Anchal, Kollam District Kerala, India
| | - Lakshmi D Polisetty
- Department of Internal Medicine, Yale New Haven Health Bridgeport Hospital, Bridgeport, NY, United States
| | | | - Ehsan Amini-Salehi
- Gastrointestinal and Liver Diseases Research Center, Guilan University of Medical Sciences, Rasht, Iran
| |
Collapse
|
3
|
Crouzet A, Lopez N, Riss Yaw B, Lepelletier Y, Demange L. The Millennia-Long Development of Drugs Associated with the 80-Year-Old Artificial Intelligence Story: The Therapeutic Big Bang? Molecules 2024; 29:2716. [PMID: 38930784 PMCID: PMC11206022 DOI: 10.3390/molecules29122716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 05/30/2024] [Accepted: 05/31/2024] [Indexed: 06/28/2024] Open
Abstract
The journey of drug discovery (DD) has evolved from ancient practices to modern technology-driven approaches, with Artificial Intelligence (AI) emerging as a pivotal force in streamlining and accelerating the process. Despite the vital importance of DD, it faces challenges such as high costs and lengthy timelines. This review examines the historical progression and current market of DD alongside the development and integration of AI technologies. We analyse the challenges encountered in applying AI to DD, focusing on drug design and protein-protein interactions. The discussion is enriched by presenting models that put forward the application of AI in DD. Three case studies are highlighted to demonstrate the successful application of AI in DD, including the discovery of a novel class of antibiotics and a small-molecule inhibitor that has progressed to phase II clinical trials. These cases underscore the potential of AI to identify new drug candidates and optimise the development process. The convergence of DD and AI embodies a transformative shift in the field, offering a path to overcome traditional obstacles. By leveraging AI, the future of DD promises enhanced efficiency and novel breakthroughs, heralding a new era of medical innovation even though there is still a long way to go.
Collapse
Affiliation(s)
- Aurore Crouzet
- UMR 8038 CNRS CiTCoM, Team PNAS, Faculté de Pharmacie, Université Paris Cité, 4 Avenue de l’Observatoire, 75006 Paris, France
- W-MedPhys, 128 Rue la Boétie, 75008 Paris, France
| | - Nicolas Lopez
- W-MedPhys, 128 Rue la Boétie, 75008 Paris, France
- ENOES, 62 Rue de Miromesnil, 75008 Paris, France
- Unité Mixte de Recherche «Institut de Physique Théorique (IPhT)» CEA-CNRS, UMR 3681, Bat 774, Route de l’Orme des Merisiers, 91191 St Aubin-Gif-sur-Yvette, France
| | - Benjamin Riss Yaw
- UMR 8038 CNRS CiTCoM, Team PNAS, Faculté de Pharmacie, Université Paris Cité, 4 Avenue de l’Observatoire, 75006 Paris, France
| | - Yves Lepelletier
- W-MedPhys, 128 Rue la Boétie, 75008 Paris, France
- Université Paris Cité, Imagine Institute, 24 Boulevard Montparnasse, 75015 Paris, France
- INSERM UMR 1163, Laboratory of Cellular and Molecular Basis of Normal Hematopoiesis and Hematological Disorders: Therapeutical Implications, 24 Boulevard Montparnasse, 75015 Paris, France
| | - Luc Demange
- UMR 8038 CNRS CiTCoM, Team PNAS, Faculté de Pharmacie, Université Paris Cité, 4 Avenue de l’Observatoire, 75006 Paris, France
| |
Collapse
|
4
|
Perrot O, Schirmann A, Vidart A, Guillot-Tantay C, Izard V, Lebret T, Boillot B, Mesnard B, Lebacle C, Madec FX. Chatbots vs andrologists: Testing 25 clinical cases. THE FRENCH JOURNAL OF UROLOGY 2024; 34:102636. [PMID: 38599321 DOI: 10.1016/j.fjurol.2024.102636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 04/02/2024] [Indexed: 04/12/2024]
Abstract
OBJECTIVE AI-derived language models are booming, and their place in medicine is undefined. The aim of our study is to compare responses to andrology clinical cases, between chatbots and andrologists, to assess the reliability of these technologies. MATERIAL AND METHOD We analyzed the responses of 32 experts, 18 residents and three chatbots (ChatGPT v3.5, v4 and Bard) to 25 andrology clinical cases. Responses were assessed on a Likert scale ranging from 0 to 2 for each question (0-false response or no response; 1-partially correct response, 2- correct response), on the basis of the latest national or, in the absence of such, international recommendations. We compared the averages obtained for all cases by the different groups. RESULTS Experts obtained a higher mean score (m=11/12.4 σ=1.4) than ChatGPT v4 (m=10.7/12.4 σ=2.2, p=0.6475), ChatGPT v3.5 (m=9.5/12.4 σ=2.1, p=0.0062) and Bard (m=7.2/12.4 σ=3.3, p<0.0001). Residents obtained a mean score (m=9.4/12.4 σ=1.7) higher than Bard (m=7.2/12.4 σ=3.3, p=0.0053) but lower than ChatGPT v3.5 (m=9.5/12.4 σ=2.1, p=0.8393) and v4 (m=10.7/12.4 σ=2.2, p=0.0183) and experts (m=11.0/12.4 σ=1.4,p=0.0009). ChatGPT v4 performance (m=10.7 σ=2.2) was better than ChatGPT v3.5 (m=9.5, σ=2.1, p=0.0476) and Bard performance (m=7.2 σ=3.3, p<0.0001). CONCLUSION The use of chatbots in medicine could be relevant. More studies are needed to integrate them into clinical practice. LEVEL OF EVIDENCE: 4
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Cedric Lebacle
- Kremlin-Bicetre Hospital, urology department, Kremlin-Bicetre, France
| | | |
Collapse
|
5
|
Griot M, Hemptinne C, Vanderdonckt J, Yuksel D. Impact of high-quality, mixed-domain data on the performance of medical language models. J Am Med Inform Assoc 2024:ocae120. [PMID: 38781312 DOI: 10.1093/jamia/ocae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/31/2024] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
OBJECTIVE To optimize the training strategy of large language models for medical applications, focusing on creating clinically relevant systems that efficiently integrate into healthcare settings, while ensuring high standards of accuracy and reliability. MATERIALS AND METHODS We curated a comprehensive collection of high-quality, domain-specific data and used it to train several models, each with different subsets of this data. These models were rigorously evaluated against standard medical benchmarks, such as the USMLE, to measure their performance. Furthermore, for a thorough effectiveness assessment, they were compared with other state-of-the-art medical models of comparable size. RESULTS The models trained with a mix of high-quality, domain-specific, and general data showed superior performance over those trained on larger, less clinically relevant datasets (P < .001). Our 7-billion-parameter model Med5 scores 60.5% on MedQA, outperforming the previous best of 49.3% from comparable models, and becomes the first of its size to achieve a passing score on the USMLE. Additionally, this model retained its proficiency in general domain tasks, comparable to state-of-the-art general domain models of similar size. DISCUSSION Our findings underscore the importance of integrating high-quality, domain-specific data in training large language models for medical purposes. The balanced approach between specialized and general data significantly enhances the model's clinical relevance and performance. CONCLUSION This study sets a new standard in medical language models, proving that a strategically trained, smaller model can outperform larger ones in clinical relevance and general proficiency, highlighting the importance of data quality and expert curation in generative artificial intelligence for healthcare applications.
Collapse
Affiliation(s)
- Maxime Griot
- Institute of NeuroScience, Université catholique de Louvain, Brussels, 1200, Belgium
- Louvain Research Institute in Management and Organizations, Université catholique de Louvain, Louvain-la-Neuve, 1348, Belgium
| | - Coralie Hemptinne
- Ophthalmology, Cliniques Universitaires Saint-Luc, Brussels, 1200, Belgium
| | - Jean Vanderdonckt
- Louvain Research Institute in Management and Organizations, Université catholique de Louvain, Louvain-la-Neuve, 1348, Belgium
| | - Demet Yuksel
- Institute of NeuroScience, Université catholique de Louvain, Brussels, 1200, Belgium
- Medical Information Department, Cliniques Universitaires Saint-Luc, Brussels, 1200, Belgium
| |
Collapse
|
6
|
Tripathi S, Sukumaran R, Cook TS. Efficient healthcare with large language models: optimizing clinical workflow and enhancing patient care. J Am Med Inform Assoc 2024; 31:1436-1440. [PMID: 38273739 PMCID: PMC11105142 DOI: 10.1093/jamia/ocad258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 12/01/2023] [Accepted: 12/29/2023] [Indexed: 01/27/2024] Open
Abstract
PURPOSE This article explores the potential of large language models (LLMs) to automate administrative tasks in healthcare, alleviating the burden on clinicians caused by electronic medical records. POTENTIAL LLMs offer opportunities in clinical documentation, prior authorization, patient education, and access to care. They can personalize patient scheduling, improve documentation accuracy, streamline insurance prior authorization, increase patient engagement, and address barriers to healthcare access. CAUTION However, integrating LLMs requires careful attention to security and privacy concerns, protecting patient data, and complying with regulations like the Health Insurance Portability and Accountability Act (HIPAA). It is crucial to acknowledge that LLMs should supplement, not replace, the human connection and care provided by healthcare professionals. CONCLUSION By prudently utilizing LLMs alongside human expertise, healthcare organizations can improve patient care and outcomes. Implementation should be approached with caution and consideration to ensure the safe and effective use of LLMs in the clinical setting.
Collapse
Affiliation(s)
- Satvik Tripathi
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Rithvik Sukumaran
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Tessa S Cook
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
7
|
Kaneda Y, Tayuinosho A, Tomoyose R, Takita M, Hamaki T, Tanimoto T, Ozaki A. Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine. J Eval Clin Pract 2024. [PMID: 38764369 DOI: 10.1111/jep.14011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 04/22/2024] [Accepted: 04/28/2024] [Indexed: 05/21/2024]
Abstract
INTRODUCTION ChatGPT, a large-scale language model, is a notable example of AI's potential in health care. However, its effectiveness in clinical settings, especially when compared to human physicians, is not fully understood. This study evaluates ChatGPT's capabilities and limitations in answering questions for Japanese internal medicine specialists, aiming to clarify its accuracy and tendencies in both correct and incorrect responses. METHODS We utilized ChatGPT's answers on four sets of self-training questions for internal medicine specialists in Japan from 2020 to 2023. We ran three trials for each set to evaluate its overall accuracy and performance on nonimage questions. Subsequently, we categorized the questions into two groups: those ChatGPT consistently answered correctly (Confirmed Correct Answer, CCA) and those it consistently answered incorrectly (Confirmed Incorrect Answer, CIA). For these groups, we calculated the average accuracy rates and 95% confidence intervals based on the actual performance of internal medicine physicians on each question and analyzed the statistical significance between the two groups. This process was then similarly applied to the subset of nonimage CCA and CIA questions. RESULTS ChatGPT's overall accuracy rate was 59.05%, increasing to 65.76% for nonimage questions. 24.87% of the questions had answers that varied between correct and incorrect in the three trials. Despite surpassing the passing threshold for nonimage questions, ChatGPT's accuracy was lower than that of human specialists. There was a significant variance in accuracy between CCA and CIA groups, with ChatGPT mirroring human physician patterns in responding to different question types. CONCLUSION This study underscores ChatGPT's potential utility and limitations in internal medicine. While effective in some aspects, its dependence on question type and context suggests that it should supplement, not replace, professional medical judgment. Further research is needed to integrate Artificial Intelligence tools like ChatGPT more effectively into specialized medical practices.
Collapse
Affiliation(s)
- Yudai Kaneda
- School of Medicine, Hokkaido University, Hokkaido, Japan
| | | | - Rika Tomoyose
- School of Medicine, Hokkaido University, Hokkaido, Japan
| | - Morihito Takita
- Department of Internal Medicine, Accessible Rail Medical Services Tetsuikai, Navitas Clinic Tachikawa, Tachikawa, Japan
| | - Tamae Hamaki
- Department of Internal Medicine, Accessible Rail Medical Services Tetsuikai, Navitas Clinic Shinjuku, Tokyo, Japan
| | - Tetsuya Tanimoto
- Internal Medicine, Accessible Rail Medical Services Tetsuikai, Navitas Clinic, Kawasaki, Kanagawa, Japan
| | - Akihiko Ozaki
- Department of Breast Surgery, Jyoban Hospital of Tokiwa Foundation, Iwaki, Fukushima, Japan
| |
Collapse
|
8
|
Fournier A, Fallet C, Sadeghipour F, Perrottet N. Assessing the applicability and appropriateness of ChatGPT in answering clinical pharmacy questions. ANNALES PHARMACEUTIQUES FRANÇAISES 2024; 82:507-513. [PMID: 37992892 DOI: 10.1016/j.pharma.2023.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/16/2023] [Accepted: 11/16/2023] [Indexed: 11/24/2023]
Abstract
OBJECTIVES Clinical pharmacists rely on different scientific references to ensure appropriate, safe, and cost-effective drug use. Tools based on artificial intelligence (AI) such as ChatGPT (Generative Pre-trained Transformer) could offer valuable support. The objective of this study was to assess ChatGPT's capacity to correctly respond to clinical pharmacy questions asked by healthcare professionals in our university hospital. MATERIAL AND METHODS ChatGPT's capacity to respond correctly to the last 100 consecutive questions recorded in our clinical pharmacy database was assessed. Questions were copied from our FileMaker Pro database and pasted into ChatGPT March 14 version online platform. The generated answers were then copied verbatim into an Excel file. Two blinded clinical pharmacists reviewed all the questions and the answers given by the software. In case of disagreements, a third blinded pharmacist intervened to decide. RESULTS Documentation-related issues (n=36) and drug administration mode (n=30) were preponderantly recorded. Among 69 applicable questions, the rate of correct answers varied from 30 to 57.1% depending on questions type with a global rate of 44.9%. Regarding inappropriate answers (n=38), 20 were incorrect, 18 gave no answers and 8 were incomplete with 8 answers belonging to 2 different categories. No better answers than the pharmacists were observed. CONCLUSIONS ChatGPT demonstrated a mitigated performance in answering clinical pharmacy questions. It should not replace human expertise as a high rate of inappropriate answers was highlighted. Future studies should focus on the optimization of ChatGPT for specific clinical pharmacy questions and explore the potential benefits and limitations of integrating this technology into clinical practice.
Collapse
Affiliation(s)
- A Fournier
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | - C Fallet
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | - F Sadeghipour
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland; Center for Research and Innovation in Clinical Pharmaceutical Sciences, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - N Perrottet
- Service of Pharmacy, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland; School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland.
| |
Collapse
|
9
|
Shrestha N, Shen Z, Zaidat B, Duey AH, Tang JE, Ahmed W, Hoang T, Restrepo Mejia M, Rajjoub R, Markowitz JS, Kim JS, Cho SK. Performance of ChatGPT on NASS Clinical Guidelines for the Diagnosis and Treatment of Low Back Pain: A Comparison Study. Spine (Phila Pa 1976) 2024; 49:640-651. [PMID: 38213186 DOI: 10.1097/brs.0000000000004915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 12/14/2023] [Indexed: 01/13/2024]
Abstract
STUDY DESIGN Comparative analysis. OBJECTIVE To evaluate Chat Generative Pre-trained Transformer (ChatGPT's) ability to predict appropriate clinical recommendations based on the most recent clinical guidelines for the diagnosis and treatment of low back pain. BACKGROUND Low back pain is a very common and often debilitating condition that affects many people globally. ChatGPT is an artificial intelligence model that may be able to generate recommendations for low back pain. MATERIALS AND METHODS Using the North American Spine Society Evidence-Based Clinical Guidelines as the gold standard, 82 clinical questions relating to low back pain were entered into ChatGPT (GPT-3.5) independently. For each question, we recorded ChatGPT's answer, then used a point-answer system-the point being the guideline recommendation and the answer being ChatGPT's response-and asked ChatGPT if the point was mentioned in the answer to assess for accuracy. This response accuracy was repeated with one caveat-a prior prompt is given in ChatGPT to answer as an experienced orthopedic surgeon-for each question by guideline category. A two-sample proportion z test was used to assess any differences between the preprompt and postprompt scenarios with alpha=0.05. RESULTS ChatGPT's response was accurate 65% (72% postprompt, P =0.41) for guidelines with clinical recommendations, 46% (58% postprompt, P =0.11) for guidelines with insufficient or conflicting data, and 49% (16% postprompt, P =0.003*) for guidelines with no adequate study to address the clinical question. For guidelines with insufficient or conflicting data, 44% (25% postprompt, P =0.01*) of ChatGPT responses wrongly suggested that sufficient evidence existed. CONCLUSION ChatGPT was able to produce a sufficient clinical guideline recommendation for low back pain, with overall improvements if initially prompted. However, it tended to wrongly suggest evidence and often failed to mention, especially postprompt, when there is not enough evidence to adequately give an accurate recommendation.
Collapse
Affiliation(s)
- Nancy Shrestha
- Chicago Medical School at Rosalind Franklin University, North Chicago, IL
| | | | - Bashar Zaidat
- Icahn School of Medicine at Mount Sinai, New York, NY
| | - Akiro H Duey
- Icahn School of Medicine at Mount Sinai, New York, NY
| | - Justin E Tang
- Icahn School of Medicine at Mount Sinai, New York, NY
| | - Wasil Ahmed
- Icahn School of Medicine at Mount Sinai, New York, NY
| | - Timothy Hoang
- Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - Rami Rajjoub
- Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - Jun S Kim
- Department of Orthopedics, Mount Sinai Health System, New York, NY
| | - Samuel K Cho
- Department of Orthopedics, Mount Sinai Health System, New York, NY
| |
Collapse
|
10
|
Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond) 2024; 38:1252-1261. [PMID: 38172581 PMCID: PMC11076576 DOI: 10.1038/s41433-023-02915-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/23/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
ChatGPT, an artificial intelligence (AI) chatbot built on large language models (LLMs), has rapidly gained popularity. The benefits and limitations of this transformative technology have been discussed across various fields, including medicine. The widespread availability of ChatGPT has enabled clinicians to study how these tools could be used for a variety of tasks such as generating differential diagnosis lists, organizing patient notes, and synthesizing literature for scientific research. LLMs have shown promising capabilities in ophthalmology by performing well on the Ophthalmic Knowledge Assessment Program, providing fairly accurate responses to questions about retinal diseases, and in generating differential diagnoses list. There are current limitations to this technology, including the propensity of LLMs to "hallucinate", or confidently generate false information; their potential role in perpetuating biases in medicine; and the challenges in incorporating LLMs into research without allowing "AI-plagiarism" or publication of false information. In this paper, we provide a balanced overview of what LLMs are and introduce some of the LLMs that have been generated in the past few years. We discuss recent literature evaluating the role of these language models in medicine with a focus on ChatGPT. The field of AI is fast-paced, and new applications based on LLMs are being generated rapidly; therefore, it is important for ophthalmologists to be aware of how this technology works and how it may impact patient care. Here, we discuss the benefits, limitations, and future advancements of LLMs in patient care and research.
Collapse
Affiliation(s)
- Nikita Kedia
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | | | - Joshua Ong
- Department of Ophthalmology and Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, MI, USA
| | - Jay Chhablani
- Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
11
|
Wang C, Ong J, Wang C, Ong H, Cheng R, Ong D. Potential for GPT Technology to Optimize Future Clinical Decision-Making Using Retrieval-Augmented Generation. Ann Biomed Eng 2024; 52:1115-1118. [PMID: 37530906 DOI: 10.1007/s10439-023-03327-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 07/17/2023] [Indexed: 08/03/2023]
Abstract
Advancements in artificial intelligence (AI) provide many helpful tools for healthcare, one of which includes AI chatbots that use natural language processing to create humanlike, conversational dialog. These chatbots have general cognitive skills and are able to engage with clinicians and patients to discuss patients' health conditions and what they may be at risk for. While chatbot engines have access to a wide range of medical texts and research papers, they currently provide high-level, generic responses and are limited in their ability to provide diagnostic guidance and clinical advice to patients on an individual level. The essay discusses the use of retrieval-augmented generation (RAG), which can be used to improve the specificity of user-entered prompts and thereby enhance the detail in AI chatbot responses. By embedding more recent clinical data and trusted medical sources, such as clinical guidelines, into the chatbot models, AI chatbots can provide more patient-specific guidance, faster diagnoses and treatment recommendations, and greater improvement of patient outcomes.
Collapse
Affiliation(s)
- Calvin Wang
- College of Medicine - Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ, 08901, USA.
| | - Joshua Ong
- Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Chara Wang
- Biotechnology High School, Freehold, NJ, USA
| | - Hannah Ong
- College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Rebekah Cheng
- Department of Physical Therapy, Virginia Commonwealth University, Richmond, VA, USA
| | - Dennis Ong
- Amazon Web Services, Amazon, Seattle, WA, USA
| |
Collapse
|
12
|
Templin T, Perez MW, Sylvia S, Leek J, Sinnott-Armstrong N. Addressing 6 challenges in generative AI for digital health: A scoping review. PLOS DIGITAL HEALTH 2024; 3:e0000503. [PMID: 38781686 PMCID: PMC11115971 DOI: 10.1371/journal.pdig.0000503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Generative artificial intelligence (AI) can exhibit biases, compromise data privacy, misinterpret prompts that are adversarial attacks, and produce hallucinations. Despite the potential of generative AI for many applications in digital health, practitioners must understand these tools and their limitations. This scoping review pays particular attention to the challenges with generative AI technologies in medical settings and surveys potential solutions. Using PubMed, we identified a total of 120 articles published by March 2024, which reference and evaluate generative AI in medicine, from which we synthesized themes and suggestions for future work. After first discussing general background on generative AI, we focus on collecting and presenting 6 challenges key for digital health practitioners and specific measures that can be taken to mitigate these challenges. Overall, bias, privacy, hallucination, and regulatory compliance were frequently considered, while other concerns around generative AI, such as overreliance on text models, adversarial misprompting, and jailbreaking, are not commonly evaluated in the current literature.
Collapse
Affiliation(s)
- Tara Templin
- Department of Health Policy and Management, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Monika W. Perez
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Sean Sylvia
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Health Policy and Management, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Sheps Center for Health Services Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jeff Leek
- Biostatistics Program, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Nasa Sinnott-Armstrong
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- Herbold Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
| |
Collapse
|
13
|
Thunström AO, Carlsen HK, Ali L, Larson T, Hellström A, Steingrimsson S. Usability Comparison Among Healthy Participants of an Anthropomorphic Digital Human and a Text-Based Chatbot as a Responder to Questions on Mental Health: Randomized Controlled Trial. JMIR Hum Factors 2024; 11:e54581. [PMID: 38683664 DOI: 10.2196/54581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/27/2024] [Accepted: 02/18/2024] [Indexed: 05/01/2024] Open
Abstract
BACKGROUND The use of chatbots in mental health support has increased exponentially in recent years, with studies showing that they may be effective in treating mental health problems. More recently, the use of visual avatars called digital humans has been introduced. Digital humans have the capability to use facial expressions as another dimension in human-computer interactions. It is important to study the difference in emotional response and usability preferences between text-based chatbots and digital humans for interacting with mental health services. OBJECTIVE This study aims to explore to what extent a digital human interface and a text-only chatbot interface differed in usability when tested by healthy participants, using BETSY (Behavior, Emotion, Therapy System, and You) which uses 2 distinct interfaces: a digital human with anthropomorphic features and a text-only user interface. We also set out to explore how chatbot-generated conversations on mental health (specific to each interface) affected self-reported feelings and biometrics. METHODS We explored to what extent a digital human with anthropomorphic features differed from a traditional text-only chatbot regarding perception of usability through the System Usability Scale, emotional reactions through electroencephalography, and feelings of closeness. Healthy participants (n=45) were randomized to 2 groups that used a digital human with anthropomorphic features (n=25) or a text-only chatbot with no such features (n=20). The groups were compared by linear regression analysis and t tests. RESULTS No differences were observed between the text-only and digital human groups regarding demographic features. The mean System Usability Scale score was 75.34 (SD 10.01; range 57-90) for the text-only chatbot versus 64.80 (SD 14.14; range 40-90) for the digital human interface. Both groups scored their respective chatbot interfaces as average or above average in usability. Women were more likely to report feeling annoyed by BETSY. CONCLUSIONS The text-only chatbot was perceived as significantly more user-friendly than the digital human, although there were no significant differences in electroencephalography measurements. Male participants exhibited lower levels of annoyance with both interfaces, contrary to previously reported findings.
Collapse
Affiliation(s)
- Almira Osmanovic Thunström
- Region Västra Götaland, Psychiatric Department, Sahlgrenska University Hospital, Gothenburg, Sweden
- Section of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Hanne Krage Carlsen
- Section of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Region Västra Götaland, Centre of Registers, Gothenburg, Sweden
| | - Lilas Ali
- Region Västra Götaland, Psychiatric Department, Sahlgrenska University Hospital, Gothenburg, Sweden
- Institute of Health Care Sciences, Centre for Person-Centred Care, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Centre for Person-Centred Care, University of Gothenburg, Gothenburg, Sweden
| | - Tomas Larson
- Region Västra Götaland, Psychiatric Department, Sahlgrenska University Hospital, Gothenburg, Sweden
- Section of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Andreas Hellström
- Department of Technology Management and Economics, Chalmers University of Technology, Gothenburg, Sweden
| | - Steinn Steingrimsson
- Region Västra Götaland, Psychiatric Department, Sahlgrenska University Hospital, Gothenburg, Sweden
- Section of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
14
|
Hudon A, Kiepura B, Pelletier M, Phan V. Using ChatGPT in Psychiatry to Design Script Concordance Tests in Undergraduate Medical Education: Mixed Methods Study. JMIR MEDICAL EDUCATION 2024; 10:e54067. [PMID: 38596832 PMCID: PMC11007379 DOI: 10.2196/54067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 03/06/2024] [Accepted: 03/07/2024] [Indexed: 04/11/2024]
Abstract
Background Undergraduate medical studies represent a wide range of learning opportunities served in the form of various teaching-learning modalities for medical learners. A clinical scenario is frequently used as a modality, followed by multiple-choice and open-ended questions among other learning and teaching methods. As such, script concordance tests (SCTs) can be used to promote a higher level of clinical reasoning. Recent technological developments have made generative artificial intelligence (AI)-based systems such as ChatGPT (OpenAI) available to assist clinician-educators in creating instructional materials. Objective The main objective of this project is to explore how SCTs generated by ChatGPT compared to SCTs produced by clinical experts on 3 major elements: the scenario (stem), clinical questions, and expert opinion. Methods This mixed method study evaluated 3 ChatGPT-generated SCTs with 3 expert-created SCTs using a predefined framework. Clinician-educators as well as resident doctors in psychiatry involved in undergraduate medical education in Quebec, Canada, evaluated via a web-based survey the 6 SCTs on 3 criteria: the scenario, clinical questions, and expert opinion. They were also asked to describe the strengths and weaknesses of the SCTs. Results A total of 102 respondents assessed the SCTs. There were no significant distinctions between the 2 types of SCTs concerning the scenario (P=.84), clinical questions (P=.99), and expert opinion (P=.07), as interpretated by the respondents. Indeed, respondents struggled to differentiate between ChatGPT- and expert-generated SCTs. ChatGPT showcased promise in expediting SCT design, aligning well with Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition criteria, albeit with a tendency toward caricatured scenarios and simplistic content. Conclusions This study is the first to concentrate on the design of SCTs supported by AI in a period where medicine is changing swiftly and where technologies generated from AI are expanding much faster. This study suggests that ChatGPT can be a valuable tool in creating educational materials, and further validation is essential to ensure educational efficacy and accuracy.
Collapse
Affiliation(s)
- Alexandre Hudon
- Department of Psychiatry and Addictology, University of Montreal, Montreal, QC, Canada
| | - Barnabé Kiepura
- Department of Psychiatry and Addictology, University of Montreal, Montreal, QC, Canada
| | | | - Véronique Phan
- Department of Pediatrics, Université de Montréal, Montreal, QC, Canada
| |
Collapse
|
15
|
Shorey S, Mattar C, Pereira TLB, Choolani M. A scoping review of ChatGPT's role in healthcare education and research. NURSE EDUCATION TODAY 2024; 135:106121. [PMID: 38340639 DOI: 10.1016/j.nedt.2024.106121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 01/05/2024] [Accepted: 02/04/2024] [Indexed: 02/12/2024]
Abstract
OBJECTIVES To examine and consolidate literature regarding the advantages and disadvantages of utilizing ChatGPT in healthcare education and research. DESIGN/METHODS We searched seven electronic databases (PubMed/Medline, CINAHL, Embase, PsycINFO, Scopus, ProQuest Dissertations and Theses Global, and Web of Science) from November 2022 until September 2023. This scoping review adhered to Arksey and O'Malley's framework and followed reporting guidelines outlined in the PRISMA-ScR checklist. For analysis, we employed Thomas and Harden's thematic synthesis framework. RESULTS A total of 100 studies were included. An overarching theme, "Forging the Future: Bridging Theory and Integration of ChatGPT" emerged, accompanied by two main themes (1) Enhancing Healthcare Education, Research, and Writing with ChatGPT, (2) Controversies and Concerns about ChatGPT in Healthcare Education Research and Writing, and seven subthemes. CONCLUSIONS Our review underscores the importance of acknowledging legitimate concerns related to the potential misuse of ChatGPT such as 'ChatGPT hallucinations', its limited understanding of specialized healthcare knowledge, its impact on teaching methods and assessments, confidentiality and security risks, and the controversial practice of crediting it as a co-author on scientific papers, among other considerations. Furthermore, our review also recognizes the urgency of establishing timely guidelines and regulations, along with the active engagement of relevant stakeholders, to ensure the responsible and safe implementation of ChatGPT's capabilities. We advocate for the use of cross-verification techniques to enhance the precision and reliability of generated content, the adaptation of higher education curricula to incorporate ChatGPT's potential, educators' need to familiarize themselves with the technology to improve their literacy and teaching approaches, and the development of innovative methods to detect ChatGPT usage. Furthermore, data protection measures should be prioritized when employing ChatGPT, and transparent reporting becomes crucial when integrating ChatGPT into academic writing.
Collapse
Affiliation(s)
- Shefaly Shorey
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
| | - Citra Mattar
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Travis Lanz-Brian Pereira
- Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Mahesh Choolani
- Division of Maternal Fetal Medicine, Department of Obstetrics and Gynaecology, National University Health Systems, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
16
|
Valdez D, Bunnell A, Lim SY, Sadowski P, Shepherd JA. Performance of Progressive Generations of GPT on an Exam Designed for Certifying Physicians as Certified Clinical Densitometrists. J Clin Densitom 2024; 27:101480. [PMID: 38401238 DOI: 10.1016/j.jocd.2024.101480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 01/25/2024] [Accepted: 02/15/2024] [Indexed: 02/26/2024]
Abstract
BACKGROUND Artificial intelligence (AI) large language models (LLMs) such as ChatGPT have demonstrated the ability to pass standardized exams. These models are not trained for a specific task, but instead trained to predict sequences of text from large corpora of documents sourced from the internet. It has been shown that even models trained on this general task can pass exams in a variety of domain-specific fields, including the United States Medical Licensing Examination. We asked if large language models would perform as well on a much narrower subdomain tests designed for medical specialists. Furthermore, we wanted to better understand how progressive generations of GPT (generative pre-trained transformer) models may be evolving in the completeness and sophistication of their responses even while generational training remains general. In this study, we evaluated the performance of two versions of GPT (GPT 3 and 4) on their ability to pass the certification exam given to physicians to work as osteoporosis specialists and become a certified clinical densitometrists. The CCD exam has a possible score range of 150 to 400. To pass, you need a score of 300. METHODS A 100-question multiple-choice practice exam was obtained from a 3rd party exam preparation website that mimics the accredited certification tests given by the ISCD (International Society for Clinical Densitometry). The exam was administered to two versions of GPT, the free version (GPT Playground) and ChatGPT+, which are based on GPT-3 and GPT-4, respectively (OpenAI, San Francisco, CA). The systems were prompted with the exam questions verbatim. If the response was purely textual and did not specify which of the multiple-choice answers to select, the authors matched the text to the closest answer. Each exam was graded and an estimated ISCD score was provided from the exam website. In addition, each response was evaluated by a rheumatologist CCD and ranked for accuracy using a 5-level scale. The two GPT versions were compared in terms of response accuracy and length. RESULTS The average response length was 11.6 ±19 words for GPT-3 and 50.0±43.6 words for GPT-4. GPT-3 answered 62 questions correctly resulting in a failing ISCD score of 289. However, GPT-4 answered 82 questions correctly with a passing score of 342. GPT-3 scored highest on the "Overview of Low Bone Mass and Osteoporosis" category (72 % correct) while GPT-4 scored well above 80 % accuracy on all categories except "Imaging Technology in Bone Health" (65 % correct). Regarding subjective accuracy, GPT-3 answered 23 questions with nonsensical or totally wrong responses while GPT-4 had no responses in that category. CONCLUSION If this had been an actual certification exam, GPT-4 would now have a CCD suffix to its name even after being trained using general internet knowledge. Clearly, more goes into physician training than can be captured in this exam. However, GPT algorithms may prove to be valuable physician aids in the diagnoses and monitoring of osteoporosis and other diseases.
Collapse
Affiliation(s)
- Dustin Valdez
- University of Hawaii at Manoa, Honolulu, HI, USA; University of Hawaii Cancer Center, Honolulu, HI, USA.
| | - Arianna Bunnell
- University of Hawaii at Manoa, Honolulu, HI, USA; University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Sian Y Lim
- Hawai'i Pacific Health Medical Group, Hawai'i Pacific Health, Honolulu, HI, USA
| | | | | |
Collapse
|
17
|
Shahin MH, Barth A, Podichetty JT, Liu Q, Goyal N, Jin JY, Ouellet D. Artificial Intelligence: From Buzzword to Useful Tool in Clinical Pharmacology. Clin Pharmacol Ther 2024; 115:698-709. [PMID: 37881133 DOI: 10.1002/cpt.3083] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 10/06/2023] [Indexed: 10/27/2023]
Abstract
The advent of artificial intelligence (AI) in clinical pharmacology and drug development is akin to the dawning of a new era. Previously dismissed as merely technological hype, these approaches have emerged as promising tools in different domains, including health care, demonstrating their potential to empower clinical pharmacology decision making, revolutionize the drug development landscape, and advance patient care. Although challenges remain, the remarkable progress already made signals that the leap from hype to reality is well underway, and AI promises to offer clinical pharmacology new tools and possibilities for optimizing patient care is gradually coming to fruition. This review dives into the burgeoning world of AI and machine learning (ML), showcasing different applications of AI in clinical pharmacology and the impact of successful AI/ML implementation on drug development and/or regulatory decisions. This review also highlights recommendations for areas of opportunity in clinical pharmacology, including data analysis (e.g., handling large data sets, screening to identify important covariates, and optimizing patient population) and efficiencies (e.g., automation, translation, literature curation, and training). Realizing the benefits of AI in drug development and understanding its value will lead to the successful integration of AI tools in our clinical pharmacology and pharmacometrics armamentarium.
Collapse
Affiliation(s)
- Mohamed H Shahin
- Clinical Pharmacology and Bioanalytics, Pfizer Inc., Groton, Connecticut, USA
| | - Aline Barth
- Clinical Pharmacology and Bioanalytics, Pfizer Inc., Groton, Connecticut, USA
| | | | - Qi Liu
- Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA
| | - Navin Goyal
- Clinical Pharmacology and Pharmacometrics, Janssen Research and Development, LLC., Spring House, Pennsylvania, USA
| | - Jin Y Jin
- Department of Clinical Pharmacology, Genentech, South San Francisco, California, USA
| | - Daniele Ouellet
- Clinical Pharmacology and Pharmacometrics, Janssen Research and Development, LLC., Spring House, Pennsylvania, USA
| |
Collapse
|
18
|
Temperley HC, O'Sullivan NJ, Mac Curtain BM, Corr A, Meaney JF, Kelly ME, Brennan I. Current applications and future potential of ChatGPT in radiology: A systematic review. J Med Imaging Radiat Oncol 2024; 68:257-264. [PMID: 38243605 DOI: 10.1111/1754-9485.13621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 12/29/2023] [Indexed: 01/21/2024]
Abstract
This study aimed to comprehensively evaluate the current utilization and future potential of ChatGPT, an AI-based chat model, in the field of radiology. The primary focus is on its role in enhancing decision-making processes, optimizing workflow efficiency, and fostering interdisciplinary collaboration and teaching within healthcare. A systematic search was conducted in PubMed, EMBASE and Web of Science databases. Key aspects, such as its impact on complex decision-making, workflow enhancement and collaboration, were assessed. Limitations and challenges associated with ChatGPT implementation were also examined. Overall, six studies met the inclusion criteria and were included in our analysis. All studies were prospective in nature. A total of 551 chatGPT (version 3.0 to 4.0) assessment events were included in our analysis. Considering the generation of academic papers, ChatGPT was found to output data inaccuracies 80% of the time. When ChatGPT was asked questions regarding common interventional radiology procedures, it contained entirely incorrect information 45% of the time. ChatGPT was seen to better answer US board-style questions when lower order thinking was required (P = 0.002). Improvements were seen between chatGPT 3.5 and 4.0 in regard to imaging questions with accuracy rates of 61 versus 85%(P = 0.009). ChatGPT was observed to have an average translational ability score of 4.27/5 on the Likert scale regarding CT and MRI findings. ChatGPT demonstrates substantial potential to augment decision-making and optimizing workflow. While ChatGPT's promise is evident, thorough evaluation and validation are imperative before widespread adoption in the field of radiology.
Collapse
Affiliation(s)
- Hugo C Temperley
- Department of Radiology, St. James's Hospital, Dublin, Ireland
- Department of Surgery, St. James's Hospital, Dublin, Ireland
| | | | | | - Alison Corr
- Department of Radiology, St. James's Hospital, Dublin, Ireland
| | - James F Meaney
- Department of Radiology, St. James's Hospital, Dublin, Ireland
| | - Michael E Kelly
- Department of Surgery, St. James's Hospital, Dublin, Ireland
| | - Ian Brennan
- Department of Radiology, St. James's Hospital, Dublin, Ireland
| |
Collapse
|
19
|
Yalamanchili A, Sengupta B, Song J, Lim S, Thomas TO, Mittal BB, Abazeed ME, Teo PT. Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions. JAMA Netw Open 2024; 7:e244630. [PMID: 38564215 PMCID: PMC10988356 DOI: 10.1001/jamanetworkopen.2024.4630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 02/04/2024] [Indexed: 04/04/2024] Open
Abstract
Importance Artificial intelligence (AI) large language models (LLMs) demonstrate potential in simulating human-like dialogue. Their efficacy in accurate patient-clinician communication within radiation oncology has yet to be explored. Objective To determine an LLM's quality of responses to radiation oncology patient care questions using both domain-specific expertise and domain-agnostic metrics. Design, Setting, and Participants This cross-sectional study retrieved questions and answers from websites (accessed February 1 to March 20, 2023) affiliated with the National Cancer Institute and the Radiological Society of North America. These questions were used as queries for an AI LLM, ChatGPT version 3.5 (accessed February 20 to April 20, 2023), to prompt LLM-generated responses. Three radiation oncologists and 3 radiation physicists ranked the LLM-generated responses for relative factual correctness, relative completeness, and relative conciseness compared with online expert answers. Statistical analysis was performed from July to October 2023. Main Outcomes and Measures The LLM's responses were ranked by experts using domain-specific metrics such as relative correctness, conciseness, completeness, and potential harm compared with online expert answers on a 5-point Likert scale. Domain-agnostic metrics encompassing cosine similarity scores, readability scores, word count, lexicon, and syllable counts were computed as independent quality checks for LLM-generated responses. Results Of the 115 radiation oncology questions retrieved from 4 professional society websites, the LLM performed the same or better in 108 responses (94%) for relative correctness, 89 responses (77%) for completeness, and 105 responses (91%) for conciseness compared with expert answers. Only 2 LLM responses were ranked as having potential harm. The mean (SD) readability consensus score for expert answers was 10.63 (3.17) vs 13.64 (2.22) for LLM answers (P < .001), indicating 10th grade and college reading levels, respectively. The mean (SD) number of syllables was 327.35 (277.15) for expert vs 376.21 (107.89) for LLM answers (P = .07), the mean (SD) word count was 226.33 (191.92) for expert vs 246.26 (69.36) for LLM answers (P = .27), and the mean (SD) lexicon score was 200.15 (171.28) for expert vs 219.10 (61.59) for LLM answers (P = .24). Conclusions and Relevance In this cross-sectional study, the LLM generated accurate, comprehensive, and concise responses with minimal risk of harm, using language similar to human experts but at a higher reading level. These findings suggest the LLM's potential, with some retraining, as a valuable resource for patient queries in radiation oncology and other medical fields.
Collapse
Affiliation(s)
- Amulya Yalamanchili
- Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Bishwambhar Sengupta
- Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Joshua Song
- Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Sara Lim
- Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Tarita O. Thomas
- Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Bharat B. Mittal
- Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Mohamed E. Abazeed
- Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - P. Troy Teo
- Robert H. Lurie Comprehensive Cancer Center, Department of Radiation Oncology, Northwestern Memorial Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| |
Collapse
|
20
|
Artsi Y, Sorin V, Konen E, Glicksberg BS, Nadkarni G, Klang E. Large language models for generating medical examinations: systematic review. BMC MEDICAL EDUCATION 2024; 24:354. [PMID: 38553693 PMCID: PMC10981304 DOI: 10.1186/s12909-024-05239-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 02/28/2024] [Indexed: 04/01/2024]
Abstract
BACKGROUND Writing multiple choice questions (MCQs) for the purpose of medical exams is challenging. It requires extensive medical knowledge, time and effort from medical educators. This systematic review focuses on the application of large language models (LLMs) in generating medical MCQs. METHODS The authors searched for studies published up to November 2023. Search terms focused on LLMs generated MCQs for medical examinations. Non-English, out of year range and studies not focusing on AI generated multiple-choice questions were excluded. MEDLINE was used as a search database. Risk of bias was evaluated using a tailored QUADAS-2 tool. RESULTS Overall, eight studies published between April 2023 and October 2023 were included. Six studies used Chat-GPT 3.5, while two employed GPT 4. Five studies showed that LLMs can produce competent questions valid for medical exams. Three studies used LLMs to write medical questions but did not evaluate the validity of the questions. One study conducted a comparative analysis of different models. One other study compared LLM-generated questions with those written by humans. All studies presented faulty questions that were deemed inappropriate for medical exams. Some questions required additional modifications in order to qualify. CONCLUSIONS LLMs can be used to write MCQs for medical examinations. However, their limitations cannot be ignored. Further study in this field is essential and more conclusive evidence is needed. Until then, LLMs may serve as a supplementary tool for writing medical examinations. 2 studies were at high risk of bias. The study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
Collapse
Affiliation(s)
- Yaara Artsi
- Azrieli Faculty of Medicine, Bar-Ilan University, Ha'Hadas St. 1, Rishon Le Zion, Zefat, 7550598, Israel.
| | - Vera Sorin
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel
- Tel-Aviv University School of Medicine, Tel Aviv, Israel
- DeepVision Lab, Chaim Sheba Medical Center, Ramat Gan, Israel
| | - Eli Konen
- Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel
- Tel-Aviv University School of Medicine, Tel Aviv, Israel
| | - Benjamin S Glicksberg
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Girish Nadkarni
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
21
|
Berry CE, Fazilat AZ, Lavin C, Lintel H, Cole N, Stingl CS, Valencia C, Morgan AG, Momeni A, Wan DC. Both Patients and Plastic Surgeons Prefer Artificial Intelligence-Generated Microsurgical Information. J Reconstr Microsurg 2024. [PMID: 38382637 DOI: 10.1055/a-2273-4163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
BACKGROUND With the growing relevance of artificial intelligence (AI)-based patient-facing information, microsurgical-specific online information provided by professional organizations was compared with that of ChatGPT (Chat Generative Pre-Trained Transformer) and assessed for accuracy, comprehensiveness, clarity, and readability. METHODS Six plastic and reconstructive surgeons blindly assessed responses to 10 microsurgery-related medical questions written either by the American Society of Reconstructive Microsurgery (ASRM) or ChatGPT based on accuracy, comprehensiveness, and clarity. Surgeons were asked to choose which source provided the overall highest-quality microsurgical patient-facing information. Additionally, 30 individuals with no medical background (ages: 18-81, μ = 49.8) were asked to determine a preference when blindly comparing materials. Readability scores were calculated, and all numerical scores were analyzed using the following six reliability formulas: Flesch-Kincaid Grade Level, Flesch-Kincaid Readability Ease, Gunning Fog Index, Simple Measure of Gobbledygook Index, Coleman-Liau Index, Linsear Write Formula, and Automated Readability Index. Statistical analysis of microsurgical-specific online sources was conducted utilizing paired t-tests. RESULTS Statistically significant differences in comprehensiveness and clarity were seen in favor of ChatGPT. Surgeons, 70.7% of the time, blindly choose ChatGPT as the source that overall provided the highest-quality microsurgical patient-facing information. Nonmedical individuals 55.9% of the time selected AI-generated microsurgical materials as well. Neither ChatGPT nor ASRM-generated materials were found to contain inaccuracies. Readability scores for both ChatGPT and ASRM materials were found to exceed recommended levels for patient proficiency across six readability formulas, with AI-based material scored as more complex. CONCLUSION AI-generated patient-facing materials were preferred by surgeons in terms of comprehensiveness and clarity when blindly compared with online material provided by ASRM. Studied AI-generated material was not found to contain inaccuracies. Additionally, surgeons and nonmedical individuals consistently indicated an overall preference for AI-generated material. A readability analysis suggested that both materials sourced from ChatGPT and ASRM surpassed recommended reading levels across six readability scores.
Collapse
Affiliation(s)
- Charlotte E Berry
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Alexander Z Fazilat
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Christopher Lavin
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Hendrik Lintel
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Naomi Cole
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Cybil S Stingl
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Caleb Valencia
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Annah G Morgan
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Arash Momeni
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| | - Derrick C Wan
- Department of Surgery, Division of Plastic and Reconstructive Surgery, Hagey Laboratory for Pediatric Regenerative Medicine, Stanford University School of Medicine, Stanford, California
| |
Collapse
|
22
|
Cocci A, Pezzoli M, Lo Re M, Russo GI, Asmundo MG, Fode M, Cacciamani G, Cimino S, Minervini A, Durukan E. Quality of information and appropriateness of ChatGPT outputs for urology patients. Prostate Cancer Prostatic Dis 2024; 27:103-108. [PMID: 37516804 DOI: 10.1038/s41391-023-00705-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/22/2023] [Accepted: 07/17/2023] [Indexed: 07/31/2023]
Abstract
BACKGROUND The proportion of health-related searches on the internet is continuously growing. ChatGPT, a natural language processing (NLP) tool created by OpenAI, has been gaining increasing user attention and can potentially be used as a source for obtaining information related to health concerns. This study aims to analyze the quality and appropriateness of ChatGPT's responses to Urology case studies compared to those of a urologist. METHODS Data from 100 patient case studies, comprising patient demographics, medical history, and urologic complaints, were sequentially inputted into ChatGPT, one by one. A question was posed to determine the most likely diagnosis, suggested examinations, and treatment options. The responses generated by ChatGPT were then compared to those provided by a board-certified urologist who was blinded to ChatGPT's responses and graded on a 5-point Likert scale based on accuracy, comprehensiveness, and clarity as criterias for appropriateness. The quality of information was graded based on the section 2 of the DISCERN tool and readability assessments were performed using the Flesch Reading Ease (FRE) and Flesch-Kincaid Reading Grade Level (FKGL) formulas. RESULTS 52% of all responses were deemed appropriate. ChatGPT provided more appropriate responses for non-oncology conditions (58.5%) compared to oncology (52.6%) and emergency urology cases (11.1%) (p = 0.03). The median score of the DISCERN tool was 15 (IQR = 5.3) corresponding to a quality score of poor. The ChatGPT responses demonstrated a college graduate reading level, as indicated by the median FRE score of 18 (IQR = 21) and the median FKGL score of 15.8 (IQR = 3). CONCLUSIONS ChatGPT serves as an interactive tool for providing medical information online, offering the possibility of enhancing health outcomes and patient satisfaction. Nevertheless, the insufficient appropriateness and poor quality of the responses on Urology cases emphasizes the importance of thorough evaluation and use of NLP-generated outputs when addressing health-related concerns.
Collapse
Affiliation(s)
- Andrea Cocci
- Urology Section, University of Florence, Florence, Italy.
| | - Marta Pezzoli
- Urology Section, University of Florence, Florence, Italy
| | - Mattia Lo Re
- Urology Section, University of Florence, Florence, Italy
| | | | | | - Mikkel Fode
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Department of Urology, Copenhagen University Hospital, Herlev and Gentofte Hospital, Copenhagen, Denmark
| | - Giovanni Cacciamani
- Institute of Urology, Keck School of Medicine, University of Southern California (USC), Los Angeles, CA, USA
| | | | | | - Emil Durukan
- Department of Urology, Copenhagen University Hospital, Herlev and Gentofte Hospital, Copenhagen, Denmark
| |
Collapse
|
23
|
Hill JE, Harris C, Clegg A. Methods for using Bing's AI-powered search engine for data extraction for a systematic review. Res Synth Methods 2024; 15:347-353. [PMID: 38066713 DOI: 10.1002/jrsm.1689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/08/2023] [Accepted: 11/20/2023] [Indexed: 12/21/2023]
Abstract
Data extraction is a time-consuming and resource-intensive task in the systematic review process. Natural language processing (NLP) artificial intelligence (AI) techniques have the potential to automate data extraction saving time and resources, accelerating the review process, and enhancing the quality and reliability of extracted data. In this paper, we propose a method for using Bing AI and Microsoft Edge as a second reviewer to verify and enhance data items first extracted by a single human reviewer. We describe a worked example of the steps involved in instructing the Bing AI Chat tool to extract study characteristics as data items from a PDF document into a table so that they can be compared with data extracted manually. We show that this technique may provide an additional verification process for data extraction where there are limited resources available or for novice reviewers. However, it should not be seen as a replacement to already established and validated double independent data extraction methods without further evaluation and verification. Use of AI techniques for data extraction in systematic reviews should be transparently and accurately described in reports. Future research should focus on the accuracy, efficiency, completeness, and user experience of using Bing AI for data extraction compared with traditional methods using two or more reviewers independently.
Collapse
Affiliation(s)
- James Edward Hill
- Synthesis, Economic Evaluation and Decision Science (SEEDS) Group, University of Central Lancashire, Preston, UK
| | - Catherine Harris
- Synthesis, Economic Evaluation and Decision Science (SEEDS) Group, University of Central Lancashire, Preston, UK
| | - Andrew Clegg
- Synthesis, Economic Evaluation and Decision Science (SEEDS) Group, University of Central Lancashire, Preston, UK
| |
Collapse
|
24
|
Shiraishi M, Lee H, Kanayama K, Moriwaki Y, Okazaki M. Appropriateness of Artificial Intelligence Chatbots in Diabetic Foot Ulcer Management. INT J LOW EXTR WOUND 2024:15347346241236811. [PMID: 38419470 DOI: 10.1177/15347346241236811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Type 2 diabetes is a significant global health concern. It often causes diabetic foot ulcers (DFUs), which affect millions of people and increase amputation and mortality rates. Despite existing guidelines, the complexity of DFU treatment makes clinical decisions challenging. Large language models such as chat generative pretrained transformer (ChatGPT), which are adept at natural language processing, have emerged as valuable resources in the medical field. However, concerns about the accuracy and reliability of the information they provide remain. We aimed to assess the accuracy of various artificial intelligence (AI) chatbots, including ChatGPT, in providing information on DFUs based on established guidelines. Seven AI chatbots were asked clinical questions (CQs) based on the DFU guidelines. Their responses were analyzed for accuracy in terms of answers to CQs, grade of recommendation, level of evidence, and agreement with the reference, including verification of the authenticity of the references provided by the chatbots. The AI chatbots showed a mean accuracy of 91.2% in answers to CQs, with discrepancies noted in grade of recommendation and level of evidence. Claude-2 outperformed other chatbots in the number of verified references (99.6%), whereas ChatGPT had the lowest rate of reference authenticity (66.3%). This study highlights the potential of AI chatbots as tools for disseminating medical information and demonstrates their high degree of accuracy in answering CQs related to DFUs. However, the variability in the accuracy of these chatbots and problems like AI hallucinations necessitate cautious use and further optimization for medical applications. This study underscores the evolving role of AI in healthcare and the importance of refining these technologies for effective use in clinical decision-making and patient education.
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Haesu Lee
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Koji Kanayama
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Yuta Moriwaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| |
Collapse
|
25
|
Abi-Rafeh J, Xu HH, Kazan R, Tevlin R, Furnas H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet Surg J 2024; 44:329-343. [PMID: 37562022 DOI: 10.1093/asj/sjad260] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/02/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND The rapidly evolving field of artificial intelligence (AI) holds great potential for plastic surgeons. ChatGPT, a recently released AI large language model (LLM), promises applications across many disciplines, including healthcare. OBJECTIVES The aim of this article was to provide a primer for plastic surgeons on AI, LLM, and ChatGPT, including an analysis of current demonstrated and proposed clinical applications. METHODS A systematic review was performed identifying medical and surgical literature on ChatGPT's proposed clinical applications. Variables assessed included applications investigated, command tasks provided, user input information, AI-emulated human skills, output validation, and reported limitations. RESULTS The analysis included 175 articles reporting on 13 plastic surgery applications and 116 additional clinical applications, categorized by field and purpose. Thirty-four applications within plastic surgery are thus proposed, with relevance to different target audiences, including attending plastic surgeons (n = 17, 50%), trainees/educators (n = 8, 24.0%), researchers/scholars (n = 7, 21%), and patients (n = 2, 6%). The 15 identified limitations of ChatGPT were categorized by training data, algorithm, and ethical considerations. CONCLUSIONS Widespread use of ChatGPT in plastic surgery will depend on rigorous research of proposed applications to validate performance and address limitations. This systemic review aims to guide research, development, and regulation to safely adopt AI in plastic surgery.
Collapse
|
26
|
Barlas T, Altinova AE, Akturk M, Toruner FB. Credibility of ChatGPT in the assessment of obesity in type 2 diabetes according to the guidelines. Int J Obes (Lond) 2024; 48:271-275. [PMID: 37951982 DOI: 10.1038/s41366-023-01410-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 10/22/2023] [Accepted: 10/30/2023] [Indexed: 11/14/2023]
Abstract
BACKGROUND The Chat Generative Pre-trained Transformer (ChatGPT) allows students, researchers, and patients in the medical field to access information easily and has gained attention nowadays. We aimed to evaluate the credibility of ChatGPT according to the guidelines for the assessment of obesity in type 2 diabetes (T2D), which is one of the major concerns of this century. MATERIALS AND METHOD In this cross-sectional non-human subject study, experienced endocrinologists posed 20 questions to ChatGPT in subsections, which were assessments and different treatment options for obesity according to the American Diabetes Association and American Association of Clinical Endocrinology guidelines. The responses of ChatGPT were classified into four categories: compatible, compatible but insufficient, partially incompatible and incompatible with the guidelines. RESULTS ChatGPT demonstrated a systematic approach to answering questions and recommended consulting a healthcare provider to receive personalized advice based on the specific health needs and circumstances of patients. The compatibility of ChatGPT with the guidelines was 100% in the assessment of obesity in type 2 diabetes; however, it was lower in the therapy sections, which included nutritional, medical, and surgical approaches to weight loss. Furthermore, ChatGPT required additional prompts for responses that were evaluated as "compatible but insufficient" to provide all the information in the guidelines. CONCLUSION The assessment and management of obesity in T2D are highly individualized. Despite ChatGPT's comprehensive and understandable responses, it should not be used as a substitute for healthcare professionals' patient-centered approach.
Collapse
Affiliation(s)
- Tugba Barlas
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey.
| | - Alev Eroglu Altinova
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| | - Mujde Akturk
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| | - Fusun Balos Toruner
- Department of Endocrinology and Metabolism, Gazi University Faculty of Medicine, Ankara, Turkey
| |
Collapse
|
27
|
Gengatharan D, Saggi SS, Bin Abd Razak HR. Pre-operative Planning of High Tibial Osteotomy With ChatGPT: Are We There Yet? Cureus 2024; 16:e54858. [PMID: 38533173 PMCID: PMC10964394 DOI: 10.7759/cureus.54858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2024] [Indexed: 03/28/2024] Open
Abstract
INTRODUCTION ChatGPT (Chat Generative Pre-trained Transformer), developed by OpenAI (San Francisco, CA, USA), has gained attention in the medical field. It has the potential to enhance and simplify tasks, such as preoperative planning in orthopedic surgery. We aimed to test ChatGPT's accuracy in measuring the angle of correction for high tibial osteotomy for cases planned and performed at a tertiary teaching hospital in Singapore. MATERIALS AND METHODS Peri-operative angular parameters from 114 consecutive patients who underwent medial opening wedge high tibial osteotomy (MOWHTO) were used to query ChatGPT 3.0. First ChatGPT 3.0 was queried on what information it required to plan a MOWHTO. Based on its response, pre-operative medial proximal tibial angle (MPTA) and joint line congruence angle (JLCA) were provided. ChatGPT 3.0 then responded with its recommended angle of correction. This was compared against the manually planned surgical correction by our fellowship-trained surgeon. A root mean square analysis was then performed to compare ChatGPT 3.0 and manual planning. RESULTS The root mean square error (RMSE) of ChatGPT 3.0 in predicting correction angle in MWHTO was 2.96, suggesting a very poor model fit. CONCLUSION Although ChatGPT 3.0 represents a significant breakthrough in large language models with extensive capabilities, it is not currently optimized to effectively perform complex pre-operative planning in orthopedic surgery, specifically in the context of MOWHTO. Further refinement and consideration of specific factors are necessary to enhance its accuracy and suitability for such applications.
Collapse
Affiliation(s)
| | | | - Hamid Rahmatullah Bin Abd Razak
- Musculoskeletal Sciences, Duke-Nus Medical School, Singapore, SGP
- Orthopaedic Surgery, Sengkang General Hospital, Singapore, SGP
| |
Collapse
|
28
|
Aliyeva A, Sari E, Alaskarov E, Nasirov R. Enhancing Postoperative Cochlear Implant Care With ChatGPT-4: A Study on Artificial Intelligence (AI)-Assisted Patient Education and Support. Cureus 2024; 16:e53897. [PMID: 38465158 PMCID: PMC10924891 DOI: 10.7759/cureus.53897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/09/2024] [Indexed: 03/12/2024] Open
Abstract
BACKGROUND Cochlear implantation is a critical surgical intervention for patients with severe hearing loss. Postoperative care is essential for successful rehabilitation, yet access to timely medical advice can be challenging, especially in remote or resource-limited settings. Integrating advanced artificial intelligence (AI) tools like Chat Generative Pre-trained Transformer (ChatGPT)-4 in post-surgical care could bridge the patient education and support gap. AIM This study aimed to assess the effectiveness of ChatGPT-4 as a supplementary information resource for postoperative cochlear implant patients. The focus was on evaluating the AI chatbot's ability to provide accurate, clear, and relevant information, particularly in scenarios where access to healthcare professionals is limited. MATERIALS AND METHODS Five common postoperative questions related to cochlear implant care were posed to ChatGPT-4. The AI chatbot's responses were analyzed for accuracy, response time, clarity, and relevance. The aim was to determine whether ChatGPT-4 could serve as a reliable source of information for patients in need, especially if the patients could not reach out to the hospital or the specialists at that moment. RESULTS ChatGPT-4 provided responses aligned with current medical guidelines, demonstrating accuracy and relevance. The AI chatbot responded to each query within seconds, indicating its potential as a timely resource. Additionally, the responses were clear and understandable, making complex medical information accessible to non-medical audiences. These findings suggest that ChatGPT-4 could effectively supplement traditional patient education, providing valuable support in postoperative care. CONCLUSION The study concluded that ChatGPT-4 has significant potential as a supportive tool for cochlear implant patients post surgery. While it cannot replace professional medical advice, ChatGPT-4 can provide immediate, accessible, and understandable information, which is particularly beneficial in special moments. This underscores the utility of AI in enhancing patient care and supporting cochlear implantation.
Collapse
Affiliation(s)
- Aynur Aliyeva
- Otorhinolaryngology-Head and Neck Surgery, Cincinnati Children's Hospital, Cincinnati, USA
| | - Elif Sari
- Otorhinolaryngology-Head and Neck Surgery, Istanbul Aydın University, VM Medikal Park Florya Hospital, Istanbul, TUR
| | - Elvin Alaskarov
- Otorhinolaryngology-Head and Neck Surgery, Istanbul Medipol University Health Care Practice and Research Center, Esenler Hospital, Istanbul, TUR
| | - Rauf Nasirov
- Neurosurgery, University of Cincinnati College of Medicine, Cincinnati, USA
| |
Collapse
|
29
|
Nguyen T. ChatGPT in Medical Education: A Precursor for Automation Bias? JMIR MEDICAL EDUCATION 2024; 10:e50174. [PMID: 38231545 PMCID: PMC10831594 DOI: 10.2196/50174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 12/11/2023] [Indexed: 01/18/2024]
Abstract
Artificial intelligence (AI) in health care has the promise of providing accurate and efficient results. However, AI can also be a black box, where the logic behind its results is nonrational. There are concerns if these questionable results are used in patient care. As physicians have the duty to provide care based on their clinical judgment in addition to their patients' values and preferences, it is crucial that physicians validate the results from AI. Yet, there are some physicians who exhibit a phenomenon known as automation bias, where there is an assumption from the user that AI is always right. This is a dangerous mindset, as users exhibiting automation bias will not validate the results, given their trust in AI systems. Several factors impact a user's susceptibility to automation bias, such as inexperience or being born in the digital age. In this editorial, I argue that these factors and a lack of AI education in the medical school curriculum cause automation bias. I also explore the harms of automation bias and why prospective physicians need to be vigilant when using AI. Furthermore, it is important to consider what attitudes are being taught to students when introducing ChatGPT, which could be some students' first time using AI, prior to their use of AI in the clinical setting. Therefore, in attempts to avoid the problem of automation bias in the long-term, in addition to incorporating AI education into the curriculum, as is necessary, the use of ChatGPT in medical education should be limited to certain tasks. Otherwise, having no constraints on what ChatGPT should be used for could lead to automation bias.
Collapse
Affiliation(s)
- Tina Nguyen
- The University of Texas Medical Branch, Galveston, TX, United States
| |
Collapse
|
30
|
Odabashian R, Bastin D, Jones G, Manzoor M, Tangestaniapour S, Assad M, Lakhani S, Odabashian M, McGee S. Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks. JMIR AI 2024; 3:e50442. [PMID: 38875575 PMCID: PMC11041475 DOI: 10.2196/50442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 10/05/2023] [Accepted: 11/19/2023] [Indexed: 06/16/2024]
Abstract
BACKGROUND ChatGPT (Open AI) is a state-of-the-art large language model that uses artificial intelligence (AI) to address questions across diverse topics. The American Society of Clinical Oncology Self-Evaluation Program (ASCO-SEP) created a comprehensive educational program to help physicians keep up to date with the many rapid advances in the field. The question bank consists of multiple choice questions addressing the many facets of cancer care, including diagnosis, treatment, and supportive care. As ChatGPT applications rapidly expand, it becomes vital to ascertain if the knowledge of ChatGPT-3.5 matches the established standards that oncologists are recommended to follow. OBJECTIVE This study aims to evaluate whether ChatGPT-3.5's knowledge aligns with the established benchmarks that oncologists are expected to adhere to. This will furnish us with a deeper understanding of the potential applications of this tool as a support for clinical decision-making. METHODS We conducted a systematic assessment of the performance of ChatGPT-3.5 on the ASCO-SEP, the leading educational and assessment tool for medical oncologists in training and practice. Over 1000 multiple choice questions covering the spectrum of cancer care were extracted. Questions were categorized by cancer type or discipline, with subcategorization as treatment, diagnosis, or other. Answers were scored as correct if ChatGPT-3.5 selected the answer as defined by ASCO-SEP. RESULTS Overall, ChatGPT-3.5 achieved a score of 56.1% (583/1040) for the correct answers provided. The program demonstrated varying levels of accuracy across cancer types or disciplines. The highest accuracy was observed in questions related to developmental therapeutics (8/10; 80% correct), while the lowest accuracy was observed in questions related to gastrointestinal cancer (102/209; 48.8% correct). There was no significant difference in the program's performance across the predefined subcategories of diagnosis, treatment, and other (P=.16, which is greater than .05). CONCLUSIONS This study evaluated ChatGPT-3.5's oncology knowledge using the ASCO-SEP, aiming to address uncertainties regarding AI tools like ChatGPT in clinical decision-making. Our findings suggest that while ChatGPT-3.5 offers a hopeful outlook for AI in oncology, its present performance in ASCO-SEP tests necessitates further refinement to reach the requisite competency levels. Future assessments could explore ChatGPT's clinical decision support capabilities with real-world clinical scenarios, its ease of integration into medical workflows, and its potential to foster interdisciplinary collaboration and patient engagement in health care settings.
Collapse
Affiliation(s)
- Roupen Odabashian
- Department of Oncology, Barbara Ann Karmanos Cancer Institute, Wayne State University, Detroit, MI, United States
| | - Donald Bastin
- Department of Medicine, Division of Internal Medicine, The Ottawa Hospital and the University of Ottawa, Ottawa, ON, Canada
| | - Georden Jones
- Mary A Rackham Institute, University of Michigan, Ann Arbor, MI, United States
| | | | | | - Malke Assad
- Department of Plastic Surgery, University of Pittsburgh Medical Center, Pittsburgh, PA, United States
| | - Sunita Lakhani
- Department of Medicine, Division of Internal Medicine, Jefferson Abington Hospital, Philadelphia, PA, United States
| | - Maritsa Odabashian
- Mary A Rackham Institute, University of Michigan, Ann Arbor, MI, United States
- The Ottawa Hospital Research Institute, Ottawa, ON, Canada
| | - Sharon McGee
- Department of Medicine, Division of Medical Oncology, The Ottawa Hospital and the University of Ottawa, Ottawa, ON, Canada
- Cancer Therapeutics Program, Ottawa Hospital Research Institute, Ottawa, ON, Canada
| |
Collapse
|
31
|
Younis HA, Eisa TAE, Nasser M, Sahib TM, Noor AA, Alyasiri OM, Salisu S, Hayder IM, Younis HA. A Systematic Review and Meta-Analysis of Artificial Intelligence Tools in Medicine and Healthcare: Applications, Considerations, Limitations, Motivation and Challenges. Diagnostics (Basel) 2024; 14:109. [PMID: 38201418 PMCID: PMC10802884 DOI: 10.3390/diagnostics14010109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/02/2023] [Accepted: 12/04/2023] [Indexed: 01/12/2024] Open
Abstract
Artificial intelligence (AI) has emerged as a transformative force in various sectors, including medicine and healthcare. Large language models like ChatGPT showcase AI's potential by generating human-like text through prompts. ChatGPT's adaptability holds promise for reshaping medical practices, improving patient care, and enhancing interactions among healthcare professionals, patients, and data. In pandemic management, ChatGPT rapidly disseminates vital information. It serves as a virtual assistant in surgical consultations, aids dental practices, simplifies medical education, and aids in disease diagnosis. A total of 82 papers were categorised into eight major areas, which are G1: treatment and medicine, G2: buildings and equipment, G3: parts of the human body and areas of the disease, G4: patients, G5: citizens, G6: cellular imaging, radiology, pulse and medical images, G7: doctors and nurses, and G8: tools, devices and administration. Balancing AI's role with human judgment remains a challenge. A systematic literature review using the PRISMA approach explored AI's transformative potential in healthcare, highlighting ChatGPT's versatile applications, limitations, motivation, and challenges. In conclusion, ChatGPT's diverse medical applications demonstrate its potential for innovation, serving as a valuable resource for students, academics, and researchers in healthcare. Additionally, this study serves as a guide, assisting students, academics, and researchers in the field of medicine and healthcare alike.
Collapse
Affiliation(s)
- Hussain A. Younis
- College of Education for Women, University of Basrah, Basrah 61004, Iraq
| | | | - Maged Nasser
- Computer & Information Sciences Department, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia;
| | - Thaeer Mueen Sahib
- Kufa Technical Institute, Al-Furat Al-Awsat Technical University, Kufa 54001, Iraq;
| | - Ameen A. Noor
- Computer Science Department, College of Education, University of Almustansirya, Baghdad 10045, Iraq;
| | | | - Sani Salisu
- Department of Information Technology, Federal University Dutse, Dutse 720101, Nigeria;
| | - Israa M. Hayder
- Qurna Technique Institute, Southern Technical University, Basrah 61016, Iraq;
| | - Hameed AbdulKareem Younis
- Department of Cybersecurity, College of Computer Science and Information Technology, University of Basrah, Basrah 61016, Iraq;
| |
Collapse
|
32
|
Munir MM, Endo Y, Ejaz A, Dillhoff M, Cloyd JM, Pawlik TM. Online artificial intelligence platforms and their applicability to gastrointestinal surgical operations. J Gastrointest Surg 2024; 28:64-69. [PMID: 38353076 DOI: 10.1016/j.gassur.2023.11.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 10/28/2023] [Accepted: 11/19/2023] [Indexed: 02/16/2024]
Abstract
BACKGROUND The internet is a common source of health information for patients. Interactive online artificial intelligence (AI) may be a more reliable source of health-related information than traditional search engines. This study aimed to assess the quality and perceived utility of chat-based AI responses related to 3 common gastrointestinal (GI) surgical procedures. METHODS A survey of 24 questions covering general perioperative information on cholecystectomy, pancreaticoduodenectomy (PD), and colectomy was created. Each question was posed to Chat Generative Pre-trained Transformer (ChatGPT) in June 2023, and the generated responses were recorded. The quality and perceived utility of responses were independently and subjectively graded by expert respondents specific to each surgical field. Grades were classified as "poor," "fair," "good," "very good," or "excellent." RESULTS Among the 45 respondents (general surgeon [n = 13], surgical oncologist [n = 18], colorectal surgeon [n = 13], and transplant surgeon [n = 1]), most practiced at an academic facility (95.6%). Respondents had been in practice for a mean of 12.3 years (general surgeon, 14.5 ± 7.2; surgical oncologist, 12.1 ± 8.2; colorectal surgeon, 10.2 ± 8.0) and performed a mean 53 index operations annually (cholecystectomy, 47 ± 28; PD, 28 ± 27; colectomy, 81 ± 44). Overall, the most commonly assigned quality grade was "fair" or "good" for most responses (n = 622/1080, 57.6%). Most of the 1080 total utility grades were "fair" (n = 279, 25.8%) or "good" (n = 344, 31.9%), whereas only 129 utility grades (11.9%) were "poor." Of note, ChatGPT responses related to cholecystectomy (45.3% ["very good"/"excellent"] vs 18.1% ["poor"/"fair"]) were deemed to be better quality than AI responses about PD (18.9% ["very good"/"excellent"] vs 46.9% ["poor"/"fair"]) or colectomy (31.4% ["very good"/"excellent"] vs 38.3% ["poor"/"fair"]). Overall, only 20.0% of the experts deemed ChatGPT to be an accurate source of information, whereas 15.6% of the experts found it unreliable. Moreover, 1 in 3 surgeons deemed ChatGPT responses as not likely to reduce patient-physician correspondence (31.1%) or not comparable to in-person surgeon responses (35.6%). CONCLUSIONS Although a potential resource for patient education, ChatGPT responses to common GI perioperative questions were deemed to be of only modest quality and utility to patients. In addition, the relative quality of AI responses varied markedly on the basis of procedure type.
Collapse
Affiliation(s)
- Muhammad Musaab Munir
- Division of Surgical Oncology, Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, Ohio, United States
| | - Yutaka Endo
- Division of Surgical Oncology, Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, Ohio, United States
| | - Aslam Ejaz
- Division of Surgical Oncology, Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, Ohio, United States
| | - Mary Dillhoff
- Division of Surgical Oncology, Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, Ohio, United States
| | - Jordan M Cloyd
- Division of Surgical Oncology, Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, Ohio, United States
| | - Timothy M Pawlik
- Division of Surgical Oncology, Department of Surgery, The Ohio State University Wexner Medical Center and James Comprehensive Cancer Center, Columbus, Ohio, United States.
| |
Collapse
|
33
|
Morales-Ramirez P, Mishek H, Dasgupta A. The Genie Is Out of the Bottle: What ChatGPT Can and Cannot Do for Medical Professionals. Obstet Gynecol 2024; 143:e1-e6. [PMID: 37944140 DOI: 10.1097/aog.0000000000005446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 10/12/2023] [Indexed: 11/12/2023]
Abstract
ChatGPT is a cutting-edge artificial intelligence technology that was released for public use in November 2022. Its rapid adoption has raised questions about capabilities, limitations, and risks. This article presents an overview of ChatGPT, and it highlights the current state of this technology for the medical field. The article seeks to provide a balanced perspective on what the model can and cannot do in three specific domains: clinical practice, research, and medical education. It also provides suggestions on how to optimize the use of this tool.
Collapse
|
34
|
Malik S, Zaheer S. ChatGPT as an aid for pathological diagnosis of cancer. Pathol Res Pract 2024; 253:154989. [PMID: 38056135 DOI: 10.1016/j.prp.2023.154989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/26/2023] [Accepted: 11/27/2023] [Indexed: 12/08/2023]
Abstract
Diagnostic workup of cancer patients is highly reliant on the science of pathology using cytopathology, histopathology, and other ancillary techniques like immunohistochemistry and molecular cytogenetics. Data processing and learning by means of artificial intelligence (AI) has become a spearhead for the advancement of medicine, with pathology and laboratory medicine being no exceptions. ChatGPT, an artificial intelligence (AI)-based chatbot, that was recently launched by OpenAI, is currently a talk of the town, and its role in cancer diagnosis is also being explored meticulously. Pathology workflow by integration of digital slides, implementation of advanced algorithms, and computer-aided diagnostic techniques extend the frontiers of the pathologist's view beyond a microscopic slide and enables effective integration, assimilation, and utilization of knowledge that is beyond human limits and boundaries. Despite of it's numerous advantages in the pathological diagnosis of cancer, it comes with several challenges like integration of digital slides with input language parameters, problems of bias, and legal issues which have to be addressed and worked up soon so that we as a pathologists diagnosing malignancies are on the same band wagon and don't miss the train.
Collapse
Affiliation(s)
- Shaivy Malik
- Department of Pathology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India
| | - Sufian Zaheer
- Department of Pathology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India.
| |
Collapse
|
35
|
Adhikari K, Naik N, Hameed BZ, Raghunath SK, Somani BK. Exploring the Ethical, Legal, and Social Implications of ChatGPT in Urology. Curr Urol Rep 2024; 25:1-8. [PMID: 37735339 DOI: 10.1007/s11934-023-01185-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/05/2023] [Indexed: 09/23/2023]
Abstract
PURPOSE OF THE REVIEW ChatGPT is programmed to generate responses based on pattern recognition. With this vast popularity and exponential growth, the question arises of moral issues, security and legitimacy. In this review article, we aim to analyze the ethical and legal implications of using ChatGPT in Urology and explore potential solutions addressing these concerns. RECENT FINDINGS There are many potential applications of ChatGPT in urology, and the extent to which it might improve healthcare may cause a profound shift in the way we deliver our services to patients and the overall healthcare system. This encompasses diagnosis and treatment planning, clinical workflow, patient education, augmenting consultations, and urological research. The ethical and legal considerations include patient autonomy and informed consent, privacy and confidentiality, bias and fairness, human oversight and accountability, trust and transparency, liability and malpractice, intellectual property rights, and regulatory framework. The application of ChatGPT in urology has shown great potential to improve patient care and assist urologists in various aspects of clinical practice, research, and education. Complying with data security and privacy regulations, and ensuring human oversight and accountability are some potential solutions to these legal and ethical concerns. Overall, the benefits and risks of using ChatGPT in urology must be weighed carefully, and a cautious approach must be taken to ensure that its use aligns with human values and advances patient care ethically and responsibly.
Collapse
Affiliation(s)
- Kinju Adhikari
- Department of Urology, HCG Cancer Centre, Bangaluru, India
| | - Nithesh Naik
- Department of Mechanical and Industrial Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Bm Zeeshan Hameed
- Department of Urology, Father Muller Medical College, Mangalore, Karnataka, India
| | - S K Raghunath
- Department of Urology, HCG Cancer Centre, Bangaluru, India
| | - Bhaskar K Somani
- Department of Urology, University Hospital Southampton NHS Trust, Southampton, SO16 6YD, UK.
| |
Collapse
|
36
|
Huang X, Estau D, Liu X, Yu Y, Qin J, Li Z. Evaluating the performance of ChatGPT in clinical pharmacy: A comparative study of ChatGPT and clinical pharmacists. Br J Clin Pharmacol 2024; 90:232-238. [PMID: 37626010 DOI: 10.1111/bcp.15896] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 08/01/2023] [Accepted: 08/14/2023] [Indexed: 08/27/2023] Open
Abstract
AIMS To evaluate the performance of chat generative pretrained transformer (ChatGPT) in key domains of clinical pharmacy practice, including prescription review, patient medication education, adverse drug reaction (ADR) recognition, ADR causality assessment and drug counselling. METHODS Questions and clinical pharmacist's answers were collected from real clinical cases and clinical pharmacist competency assessment. ChatGPT's responses were generated by inputting the same question into the 'New Chat' box of ChatGPT Mar 23 Version. Five licensed clinical pharmacists independently rated these answers on a scale of 0 (Completely incorrect) to 10 (Completely correct). The mean scores of ChatGPT and clinical pharmacists were compared using a paired 2-tailed Student's t-test. The text content of the answers was also descriptively summarized together. RESULTS The quantitative results indicated that ChatGPT was excellent in drug counselling (ChatGPT: 8.77 vs. clinical pharmacist: 9.50, P = .0791) and weak in prescription review (5.23 vs. 9.90, P = .0089), patient medication education (6.20 vs. 9.07, P = .0032), ADR recognition (5.07 vs. 9.70, P = .0483) and ADR causality assessment (4.03 vs. 9.73, P = .023). The capabilities and limitations of ChatGPT in clinical pharmacy practice were summarized based on the completeness and accuracy of the answers. ChatGPT revealed robust retrieval, information integration and dialogue capabilities. It lacked medicine-specific datasets as well as the ability for handling advanced reasoning and complex instructions. CONCLUSIONS While ChatGPT holds promise in clinical pharmacy practice as a supplementary tool, the ability of ChatGPT to handle complex problems needs further improvement and refinement.
Collapse
Affiliation(s)
- Xiaoru Huang
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
| | - Dannya Estau
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
| | - Xuening Liu
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
| | - Yang Yu
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
| | - Jiguang Qin
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
| | - Zijian Li
- Department of Pharmacy, Peking University Third Hospital, Beijing, China
- Department of Pharmaceutical Management and Clinical Pharmacy, College of Pharmacy, Peking University, Beijing, China
- Department of Cardiology and Institute of Vascular Medicine, Peking University Third Hospital, Beijing Key Laboratory of Cardiovascular Receptors Research, Key Laboratory of Cardiovascular Molecular Biology and Regulatory Peptides, Ministry of Health, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing, China
| |
Collapse
|
37
|
Liao W, Liu Z, Dai H, Xu S, Wu Z, Zhang Y, Huang X, Zhu D, Cai H, Li Q, Liu T, Li X. Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study. JMIR MEDICAL EDUCATION 2023; 9:e48904. [PMID: 38153785 PMCID: PMC10784984 DOI: 10.2196/48904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 08/03/2023] [Accepted: 09/10/2023] [Indexed: 12/29/2023]
Abstract
BACKGROUND Large language models, such as ChatGPT, are capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the internet. However, medical texts, such as clinical notes and diagnoses, require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to health care and the general public. OBJECTIVE This study is among the first on responsible artificial intelligence-generated content in medicine. We focus on analyzing the differences between medical texts written by human experts and those generated by ChatGPT and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT. METHODS We first constructed a suite of data sets containing medical texts written by human experts and generated by ChatGPT. We analyzed the linguistic features of these 2 types of content and uncovered differences in vocabulary, parts-of-speech, dependency, sentiment, perplexity, and other aspects. Finally, we designed and implemented machine learning methods to detect medical text generated by ChatGPT. The data and code used in this paper are published on GitHub. RESULTS Medical texts written by humans were more concrete, more diverse, and typically contained more useful information, while medical texts generated by ChatGPT paid more attention to fluency and logic and usually expressed general terminologies rather than effective information specific to the context of the problem. A bidirectional encoder representations from transformers-based model effectively detected medical texts generated by ChatGPT, and the F1 score exceeded 95%. CONCLUSIONS Although text generated by ChatGPT is grammatically perfect and human-like, the linguistic characteristics of generated medical texts were different from those written by human experts. Medical text generated by ChatGPT could be effectively detected by the proposed machine learning algorithms. This study provides a pathway toward trustworthy and accountable use of large language models in medicine.
Collapse
Affiliation(s)
- Wenxiong Liao
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Zhengliang Liu
- School of Computing, University of Georgia, Athens, GA, United States
| | - Haixing Dai
- School of Computing, University of Georgia, Athens, GA, United States
| | - Shaochen Xu
- School of Computing, University of Georgia, Athens, GA, United States
| | - Zihao Wu
- School of Computing, University of Georgia, Athens, GA, United States
| | - Yiyang Zhang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Xiaoke Huang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Dajiang Zhu
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United States
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Quanzheng Li
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Tianming Liu
- School of Computing, University of Georgia, Athens, GA, United States
| | - Xiang Li
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| |
Collapse
|
38
|
Ferreira RM. New evidence-based practice: Artificial intelligence as a barrier breaker. World J Methodol 2023; 13:384-389. [PMID: 38229944 PMCID: PMC10789101 DOI: 10.5662/wjm.v13.i5.384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 10/24/2023] [Accepted: 11/08/2023] [Indexed: 12/20/2023] Open
Abstract
The concept of evidence-based practice has persisted over several years and remains a cornerstone in clinical practice, representing the gold standard for optimal patient care. However, despite widespread recognition of its significance, practical application faces various challenges and barriers, including a lack of skills in interpreting studies, limited resources, time constraints, linguistic competencies, and more. Recently, we have witnessed the emergence of a groundbreaking technological revolution known as artificial intelligence. Although artificial intelligence has become increasingly integrated into our daily lives, some reluctance persists among certain segments of the public. This article explores the potential of artificial intelligence as a solution to some of the main barriers encountered in the application of evidence-based practice. It highlights how artificial intelligence can assist in staying updated with the latest evidence, enhancing clinical decision-making, addressing patient misinformation, and mitigating time constraints in clinical practice. The integration of artificial intelligence into evidence-based practice has the potential to revolutionize healthcare, leading to more precise diagnoses, personalized treatment plans, and improved doctor-patient interactions. This proposed synergy between evidence-based practice and artificial intelligence may necessitate adjustments to its core concept, heralding a new era in healthcare.
Collapse
Affiliation(s)
- Ricardo Maia Ferreira
- Department of Sports and Exercise, Polytechnic Institute of Maia (N2i), Maia 4475-690, Porto, Portugal
- Department of Physioterapy, Polytechnic Institute of Coimbra, Coimbra Health School, Coimbra 3046-854, Coimbra, Portugal
- Department of Physioterapy, Polytechnic Institute of Castelo Branco, Dr. Lopes Dias Health School, Castelo Branco 6000-767, Castelo Branco, Portugal
- Sport Physical Activity and Health Research & Innovation Center, Polytechnic Institute of Viana do Castelo, Melgaço, 4960-320, Viana do Castelo, Portugal
| |
Collapse
|
39
|
Ebrahimian M, Behnam B, Ghayebi N, Sobhrakhshankhah E. ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model. BMJ Health Care Inform 2023; 30:e100815. [PMID: 38081765 PMCID: PMC10729145 DOI: 10.1136/bmjhci-2023-100815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/28/2023] [Indexed: 12/18/2023] Open
Abstract
INTRODUCTION Large language models such as ChatGPT have gained popularity for their ability to generate comprehensive responses to human queries. In the field of medicine, ChatGPT has shown promise in applications ranging from diagnostics to decision-making. However, its performance in medical examinations and its comparison to random guessing have not been extensively studied. METHODS This study aimed to evaluate the performance of ChatGPT in the preinternship examination, a comprehensive medical assessment for students in Iran. The examination consisted of 200 multiple-choice questions categorised into basic science evaluation, diagnosis and decision-making. GPT-4 was used, and the questions were translated to English. A statistical analysis was conducted to assess the performance of ChatGPT and also compare it with a random test group. RESULTS The results showed that ChatGPT performed exceptionally well, with 68.5% of the questions answered correctly, significantly surpassing the pass mark of 45%. It exhibited superior performance in decision-making and successfully passed all specialties. Comparing ChatGPT to the random test group, ChatGPT's performance was significantly higher, demonstrating its ability to provide more accurate responses and reasoning. CONCLUSION This study highlights the potential of ChatGPT in medical licensing examinations and its advantage over random guessing. However, it is important to note that ChatGPT still falls short of human physicians in terms of diagnostic accuracy and decision-making capabilities. Caution should be exercised when using ChatGPT, and its results should be verified by human experts to ensure patient safety and avoid potential errors in the medical field.
Collapse
Affiliation(s)
- Manoochehr Ebrahimian
- Pediatric Surgery Research Center, Research Institute for Children's Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Behdad Behnam
- Gastrointestinal and Liver Disease Research Center, Iran University of Medical Sciences, Tehran, Iran
| | - Negin Ghayebi
- School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Elham Sobhrakhshankhah
- Gastrointestinal and Liver Disease Research Center, Iran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
40
|
Al-Dujaili Z, Omari S, Pillai J, Al Faraj A. Assessing the accuracy and consistency of ChatGPT in clinical pharmacy management: A preliminary analysis with clinical pharmacy experts worldwide. Res Social Adm Pharm 2023; 19:1590-1594. [PMID: 37696742 DOI: 10.1016/j.sapharm.2023.08.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/30/2023] [Accepted: 08/30/2023] [Indexed: 09/13/2023]
Abstract
BACKGROUND ChatGPT conversation system has ushered in a revolutionary new era of information retrieval and stands as one of the fastest-growing platforms. Clinical pharmacy, as a dynamic discipline, necessitates an advanced comprehension of drugs and diseases. The process of decision-making in clinical pharmacy demands accuracy and consistency in medical information, as it directly affects patient safety. OBJECTIVE The objective was to evaluate ChatGPT's accuracy and consistency in managing pharmacotherapy cases across multiple time points. Additionally, input was gathered from global clinical pharmacy experts, and the agreement between ChatGPT's responses and those of clinical pharmacy experts worldwide was assessed. METHODS A set of 20 cases of pharmacotherapy was entered into ChatGPT at three different time points. Reliability analysis was performed using inter-rater reliability to measure the accuracy of the output generated by ChatGPT at each time point. Test-retest reliability was performed to measure the consistency of the output generated by ChatGPT across the three time points. Pharmacy expert performance was evaluated, and the overall results were compared. RESULTS ChatGPT achieved a hit rate of 70.83% at week 1, 79.2% at week 3, and 75% at week 5. The percent agreement between weeks 1 and 3 was 79.2%, whereas it was 87.5% between weeks 3 and 5, and 83.3% between weeks 1 and 5. In contrast, accuracy rates among clinical pharmacy experts showed considerable variation according to their geographic location. The highest agreement between clinical pharmacist responses and ChatGPT responses was observed at the last time point examined. CONCLUSIONS Overall, the analysis suggested that ChatGPT is capable of generating clinically relevant pharmaceutical information, albeit with some variation in accuracy and consistency. It should be noted that clinical pharmacy experts worldwide may provide varying degrees of accuracy depending on their expertise. This study highlights the potential of AI chatbots in clinical pharmacy.
Collapse
Affiliation(s)
- Zahraa Al-Dujaili
- College of Pharmacy, American University of Iraq - Baghdad (AUIB), Baghdad, Iraq
| | - Sarah Omari
- Department of Epidemiology and Population Health, American University of Beirut (AUB), Beirut, Lebanon
| | - Jey Pillai
- College of Pharmacy, American University of Iraq - Baghdad (AUIB), Baghdad, Iraq
| | - Achraf Al Faraj
- College of Pharmacy, American University of Iraq - Baghdad (AUIB), Baghdad, Iraq.
| |
Collapse
|
41
|
Franco D'Souza R, Amanullah S, Mathew M, Surapaneni KM. Appraising the performance of ChatGPT in psychiatry using 100 clinical case vignettes. Asian J Psychiatr 2023; 89:103770. [PMID: 37812998 DOI: 10.1016/j.ajp.2023.103770] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 09/13/2023] [Accepted: 09/18/2023] [Indexed: 10/11/2023]
Abstract
BACKGROUND ChatGPT has emerged as the most advanced and rapidly developing large language chatbot system. With its immense potential ranging from answering a simple query to cracking highly competitive medical exams, ChatGPT continues to impress the scientists and researchers worldwide giving room for more discussions regarding its utility in various fields. One such field of attention is Psychiatry. With suboptimal diagnosis and treatment, assuring mental health and well-being is a challenge in many countries, particularly developing nations. To this regard, we conducted an evaluation to assess the performance of ChatGPT 3.5 in Psychiatry using clinical cases to provide evidence-based information regarding the implication of ChatGPT 3.5 in enhancing mental health and well-being. METHODS ChatGPT 3.5 was used in this experimental study to initiate the conversations and collect responses to clinical vignettes in Psychiatry. Using 100 clinical case vignettes, the replies were assessed by expert faculties from the Department of Psychiatry. There were 100 different psychiatric illnesses represented in the cases. We recorded and assessed the initial ChatGPT 3.5 responses. The evaluation was conducted using the objective of questions that were put forth at the conclusion of the case, and the aim of the questions was divided into 10 categories. The grading was completed by taking the mean value of the scores provided by the evaluators. Graphs and tables were used to represent the grades. RESULTS The evaluation report suggests that ChatGPT 3.5 fared extremely well in Psychiatry by receiving "Grade A" ratings in 61 out of 100 cases, "Grade B" ratings in 31, and "Grade C" ratings in 8. Majority of the queries were concerned with the management strategies, which were followed by diagnosis, differential diagnosis, assessment, investigation, counselling, clinical reasoning, ethical reasoning, prognosis, and request acceptance. ChatGPT 3.5 performed extremely well, especially in generating management strategies followed by diagnoses for different psychiatric conditions. There were no responses which were graded "D" indicating that there were no errors in the diagnosis or response for clinical care. Only a few discrepancies and additional details were missed in a few responses that received a "Grade C" CONCLUSION: It is evident from our study that ChatGPT 3.5 has appreciable knowledge and interpretation skills in Psychiatry. Thus, ChatGPT 3.5 undoubtedly has the potential to transform the field of Medicine and we emphasize its utility in Psychiatry through the finding of our study. However, for any AI model to be successful, assuring the reliability, validation of information, proper guidelines and implementation framework are necessary.
Collapse
Affiliation(s)
- Russell Franco D'Souza
- Professor of Organizational Psychological Medicine, International Institute of Organisational Psychological Medicine, 71 Cleeland Street, Dandenong Victoria, Melbourne, 3175 Australia
| | - Shabbir Amanullah
- Division of Geriatric Psychiatry, Queen's University, 752 King Street West, Postal Bag 603 Kingston, ON K7L7X3
| | - Mary Mathew
- Department of Pathology, Kasturba Medical College, Manipal Academy of Higher Education, Tiger Circle Road, Madhav Nagar, Manipal, Karnataka 576104
| | - Krishna Mohan Surapaneni
- Department of Biochemistry, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai - 600 123, Tamil Nadu, India; Departments of Medical Education, Molecular Virology, Research, Clinical Skills & Simulation, Panimalar Medical College Hospital & Research Institute, Varadharajapuram, Poonamallee, Chennai - 600 123, Tamil Nadu, India.
| |
Collapse
|
42
|
Aliyeva A. "Bot or Not": Turing Problem in Otolaryngology. Cureus 2023; 15:e48170. [PMID: 38046723 PMCID: PMC10693309 DOI: 10.7759/cureus.48170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/02/2023] [Indexed: 12/05/2023] Open
Abstract
The aim of this article is to shed light on the evolving landscape of artificial intelligence (AI) integration in otolaryngology and its implications, particularly focusing on the ethical considerations surrounding AI applications, and to highlight the potential benefits of ChatGPT in patient management and scientific research within otolaryngology while emphasizing the necessity for ethical guidelines and validation processes. Ultimately, the article seeks to encourage a responsible and informed approach to AI adoption in otolaryngology, promoting collaboration between AI and healthcare professionals for the betterment of science and human well-being.
Collapse
Affiliation(s)
- Aynur Aliyeva
- Otolaryngology - Head and Neck Surgery, Cincinnati Children's Hospital Medical Center, Ohio, USA
| |
Collapse
|
43
|
Santana LADM, Gonçalo RIC, Barbosa BF, Takeshita WM, Trento CL. Authors' reply: Combining ChatGPT and machine learning: A viable alternative in oral medicine. Oral Dis 2023. [PMID: 37848339 DOI: 10.1111/odi.14741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/19/2023]
Affiliation(s)
- Lucas Alves da Mota Santana
- Department of Dentistry, Federal University of Sergipe (UFS), Aracaju, SE, Brazil
- Department of Dentistry, Tiradentes University (UNIT), Aracaju, SE, Brazil
| | - Rani Iani Costa Gonçalo
- Department of Dentistry, Federal University of Rio Grande do Norte (UFRN), Natal, RN, Brazil
| | | | - Wilton Mitsunari Takeshita
- Department of Diagnosis and Surgery, School of Dentistry, São Paulo State University (UNESP), Araçatuba, SP, Brazil
| | | |
Collapse
|
44
|
Reddy A, Patel S, Barik AK, Gowda P. Role of chat-generative pre-trained transformer (ChatGPT) in anaesthesia: Merits and pitfalls. Indian J Anaesth 2023; 67:942-944. [PMID: 38044929 PMCID: PMC10691596 DOI: 10.4103/ija.ija_504_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/15/2023] [Accepted: 06/18/2023] [Indexed: 12/05/2023] Open
Affiliation(s)
- Ashwini Reddy
- Department of Anaesthesia and Intensive Care, Postgraduate Institute of Medical Education and Research, Chandigarh, India
| | - Swati Patel
- Department of Anaesthesia and Intensive Care, Postgraduate Institute of Medical Education and Research, Chandigarh, India
| | - Amiya Kumar Barik
- Department of Anaesthesia and Intensive Care, Postgraduate Institute of Medical Education and Research, Chandigarh, India
| | - Punith Gowda
- Department of Anaesthesia and Intensive Care, Postgraduate Institute of Medical Education and Research, Chandigarh, India
| |
Collapse
|
45
|
Cocci A, Pezzoli M, Minervini A. Light and Shadow of ChatGPT: A Real Tool for Advancing Scientific Research and Medical Practice? World J Mens Health 2023; 41:751-752. [PMID: 37652659 PMCID: PMC10523110 DOI: 10.5534/wjmh.230102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 04/26/2023] [Accepted: 05/03/2023] [Indexed: 09/02/2023] Open
Affiliation(s)
- Andrea Cocci
- Department of Urology, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy
| | - Marta Pezzoli
- Department of Urology, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy.
| | - Andrea Minervini
- Department of Urology, Azienda Ospedaliera Universitaria Careggi, University of Florence, Florence, Italy
| |
Collapse
|
46
|
Gobira M, Nakayama LF, Moreira R, Andrade E, Regatieri CVS, Belfort R. Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation. REVISTA DA ASSOCIACAO MEDICA BRASILEIRA (1992) 2023; 69:e20230848. [PMID: 37792871 PMCID: PMC10547492 DOI: 10.1590/1806-9282.20230848] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 07/17/2023] [Indexed: 10/06/2023]
Abstract
OBJECTIVE The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis. RESULTS In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups. CONCLUSION ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.
Collapse
Affiliation(s)
- Mauro Gobira
- Instituto Paulista de Estudos e Pesquisas em Oftalmologia, Vision Institute – São Paulo (SP), Brazil
| | - Luis Filipe Nakayama
- Instituto Paulista de Estudos e Pesquisas em Oftalmologia, Vision Institute – São Paulo (SP), Brazil
- Massachusetts Institute of Technology, Institute for Medical Engineering and Science – Cambridge (MA), USA
| | - Rodrigo Moreira
- Instituto Paulista de Estudos e Pesquisas em Oftalmologia, Vision Institute – São Paulo (SP), Brazil
| | - Eric Andrade
- Universidade Federal de São Paulo, Department of Ophthalmology – São Paulo (SP), Brazil
| | | | - Rubens Belfort
- Universidade Federal de São Paulo, Department of Ophthalmology – São Paulo (SP), Brazil
| |
Collapse
|
47
|
Lai UH, Wu KS, Hsu TY, Kan JKC. Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment. Front Med (Lausanne) 2023; 10:1240915. [PMID: 37795422 PMCID: PMC10547055 DOI: 10.3389/fmed.2023.1240915] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 08/30/2023] [Indexed: 10/06/2023] Open
Abstract
Introduction Recent developments in artificial intelligence large language models (LLMs), such as ChatGPT, have allowed for the understanding and generation of human-like text. Studies have found LLMs abilities to perform well in various examinations including law, business and medicine. This study aims to evaluate the performance of ChatGPT in the United Kingdom Medical Licensing Assessment (UKMLA). Methods Two publicly available UKMLA papers consisting of 200 single-best-answer (SBA) questions were screened. Nine SBAs were omitted as they contained images that were not suitable for input. Each question was assigned a specialty based on the UKMLA content map published by the General Medical Council. A total of 191 SBAs were inputted in ChatGPT-4 through three attempts over the course of 3 weeks (once per week). Results ChatGPT scored 74.9% (143/191), 78.0% (149/191) and 75.6% (145/191) on three attempts, respectively. The average of all three attempts was 76.3% (437/573) with a 95% confidence interval of (74.46% and 78.08%). ChatGPT answered 129 SBAs correctly and 32 SBAs incorrectly on all three attempts. On three attempts, ChatGPT performed well in mental health (8/9 SBAs), cancer (11/14 SBAs) and cardiovascular (10/13 SBAs). On three attempts, ChatGPT did not perform well in clinical haematology (3/7 SBAs), endocrine and metabolic (2/5 SBAs) and gastrointestinal including liver (3/10 SBAs). Regarding to response consistency, ChatGPT provided correct answers consistently in 67.5% (129/191) of SBAs but provided incorrect answers consistently in 12.6% (24/191) and inconsistent response in 19.9% (38/191) of SBAs, respectively. Discussion and conclusion This study suggests ChatGPT performs well in the UKMLA. There may be a potential correlation between specialty performance. LLMs ability to correctly answer SBAs suggests that it could be utilised as a supplementary learning tool in medical education with appropriate medical educator supervision.
Collapse
Affiliation(s)
- U Hin Lai
- Sandwell and West Birmingham NHS Trust, West Bromwich, United Kingdom
- Aston Medical School, Birmingham, United Kingdom
| | - Keng Sam Wu
- Sandwell and West Birmingham NHS Trust, West Bromwich, United Kingdom
- University Hospitals Birmingham NHS Trust, Birmingham, United Kingdom
| | - Ting-Yu Hsu
- Aston Medical School, Birmingham, United Kingdom
- University Hospitals Birmingham NHS Trust, Birmingham, United Kingdom
| | - Jessie Kai Ching Kan
- Aston Medical School, Birmingham, United Kingdom
- Worcestershire Acute Hospitals NHS Trust, Worcester, United Kingdom
| |
Collapse
|
48
|
Garg RK, Urs VL, Agarwal AA, Chaudhary SK, Paliwal V, Kar SK. Exploring the role of ChatGPT in patient care (diagnosis and treatment) and medical research: A systematic review. Health Promot Perspect 2023; 13:183-191. [PMID: 37808939 PMCID: PMC10558973 DOI: 10.34172/hpp.2023.22] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 07/06/2023] [Indexed: 10/10/2023] Open
Abstract
Background ChatGPT is an artificial intelligence based tool developed by OpenAI (California, USA). This systematic review examines the potential of ChatGPT in patient care and its role in medical research. Methods The systematic review was done according to the PRISMA guidelines. Embase, Scopus, PubMed and Google Scholar data bases were searched. We also searched preprint data bases. Our search was aimed to identify all kinds of publications, without any restrictions, on ChatGPT and its application in medical research, medical publishing and patient care. We used search term "ChatGPT". We reviewed all kinds of publications including original articles, reviews, editorial/ commentaries, and even letter to the editor. Each selected records were analysed using ChatGPT and responses generated were compiled in a table. The word table was transformed in to a PDF and was further analysed using ChatPDF. Results We reviewed full texts of 118 articles. ChatGPT can assist with patient enquiries, note writing, decision-making, trial enrolment, data management, decision support, research support, and patient education. But the solutions it offers are usually insufficient and contradictory, raising questions about their originality, privacy, correctness, bias, and legality. Due to its lack of human-like qualities, ChatGPT's legitimacy as an author is questioned when used for academic writing. ChatGPT generated contents have concerns with bias and possible plagiarism. Conclusion Although it can help with patient treatment and research, there are issues with accuracy, authorship, and bias. ChatGPT can serve as a "clinical assistant" and be a help in research and scholarly writing.
Collapse
Affiliation(s)
| | - Vijeth L Urs
- Department of Neurology, King George’s Medical University, Lucknow, India
| | | | | | - Vimal Paliwal
- Department of Neurology, Sanjay Gandhi Institute of Medical Sciences, Lucknow, India
| | - Sujita Kumar Kar
- Department of Psychiatry, King George’s Medical University, Lucknow, India
| |
Collapse
|
49
|
Watters C, Lemanski MK. Universal skepticism of ChatGPT: a review of early literature on chat generative pre-trained transformer. Front Big Data 2023; 6:1224976. [PMID: 37680954 PMCID: PMC10482048 DOI: 10.3389/fdata.2023.1224976] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/10/2023] [Indexed: 09/09/2023] Open
Abstract
ChatGPT, a new language model developed by OpenAI, has garnered significant attention in various fields since its release. This literature review provides an overview of early ChatGPT literature across multiple disciplines, exploring its applications, limitations, and ethical considerations. The review encompasses Scopus-indexed publications from November 2022 to April 2023 and includes 156 articles related to ChatGPT. The findings reveal a predominance of negative sentiment across disciplines, though subject-specific attitudes must be considered. The review highlights the implications of ChatGPT in many fields including healthcare, raising concerns about employment opportunities and ethical considerations. While ChatGPT holds promise for improved communication, further research is needed to address its capabilities and limitations. This literature review provides insights into early research on ChatGPT, informing future investigations and practical applications of chatbot technology, as well as development and usage of generative AI.
Collapse
Affiliation(s)
- Casey Watters
- Faculty of Law, Bond University, Gold Coast, QLD, Australia
| | | |
Collapse
|
50
|
Alanzi TM. Impact of ChatGPT on Teleconsultants in Healthcare: Perceptions of Healthcare Experts in Saudi Arabia. J Multidiscip Healthc 2023; 16:2309-2321. [PMID: 37601325 PMCID: PMC10438433 DOI: 10.2147/jmdh.s419847] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 08/01/2023] [Indexed: 08/22/2023] Open
Abstract
Purpose This study aims to investigate the impact of ChatGPT on teleconsultants in managing their operations and services. Methods A qualitative approach with focus groups is adopted in this study. A total of 54 participants with varying degrees of experience using AI such as ChatGPT in healthcare, including 11 physicians, 24 nurses, eight dieticians, six pharmacists, and five physiotherapists providing teleconsultations participated in this study. Results Twelve themes including informational support, diagnostic assistance, communication, enhancing efficiency, cost and time saving, personalizing care, multilingual support, assisting in medical research, decision-making, documentation, continuing education, and enhanced team collaboration reflecting positive impact were identified from the data analysis of seven focus groups. In addition, six themes including misdiagnosis and errors, issues in personalized care, ethical and legal issues, limited medical context/knowledge, communication challenges, and increased dependency reflecting negative impact were identified. Conclusion Although ChatGPT has several advantages for teleconsultants in the healthcare sector, it is associated with ethical issues.
Collapse
Affiliation(s)
- Turki M Alanzi
- Health Information Management and Technology Department, College of Public Health, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| |
Collapse
|