1
|
Quinn M, Milner JD, Schmitt P, Morrissey P, Lemme N, Marcaccio S, DeFroda S, Tabaddor R, Owens BD. Artificial Intelligence Large Language Models Address Anterior Cruciate Ligament Reconstruction: Superior Clarity and Completeness by Gemini Compared to ChatGPT-4 in Response to American Academy of Orthopedic Surgeons Clinical Practice Guidelines. Arthroscopy 2024:S0749-8063(24)00736-9. [PMID: 39313138 DOI: 10.1016/j.arthro.2024.09.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 08/31/2024] [Accepted: 09/05/2024] [Indexed: 09/25/2024]
Abstract
PURPOSE To assess the ability of ChatGPT-4 and Gemini to generate accurate and relevant responses to the 2022 American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines for ACLR. METHODS Responses from ChatGPT-4 and Gemini to prompts derived from all 15 AAOS guidelines were evaluated by seven fellowship trained orthopedic sports medicine surgeons using a structured questionnaire assessing five key characteristics on a scale from 1-5. The prompts were categorized into three areas: Diagnosis and Preoperative Management, Surgical Timing and Technique, and Rehabilitation and Prevention. Statistical analysis included mean scoring, standard deviation, and two-sided t-tests to compare the performance between the two LLMs. Scores were then evaluated for inter-rater reliability (IRR). RESULTS Overall, both LLMs performed well with means scores > 4 for the five key characteristics. Gemini demonstrated superior performance in overall clarity (4.848 ± 0.36 vs 4.743 ± 0.481, p = 0.034), but all other characteristics demonstrated non-significant differences (p = >0.05). Gemini also demonstrated superior clarity in the surgical timing and technique (p= 0.038) as well as the prevention and rehabilitation (p= 0.044) sub-categories. Additionally, Gemini had superior performance completeness scores in the rehabilitation and prevention sub-category (p= 0.044), but no statistically significant differences were found amongst the other sub-categories. The overall IRR was found to be 0.71 (moderate). CONCLUSION Both Gemini and ChatGPT-4 demonstrate an overall good ability to generate accurate and relevant responses to question prompts based on the 2022 AAOS clinical practice guidelines for ACLR. However, Gemini demonstrated superior clarity in multiple domains in addition to superior completeness for questions pertaining to rehabilitation and prevention. CLINICAL RELEVANCE The current study addresses a current gap in the LLM and ACLR literature by comparing the performance of ChatGPT-4 to Gemini, which is growing in popularity with more than 300 million individual uses in May 2024 alone. Moreover, the results demonstrated superior performance of Gemini in both clarity and completeness, which are critical elements of a tool being used by patients for educational purposes. Additionally, the current study uses question prompts based on the AAOS CPG which may be used as a method of standardization for future investigations on performance of LLM platforms. For these reasons, the authors believe that the results of the current study would be of interest to both the readership of Arthroscopy and patients, alike.
Collapse
|
2
|
Fröling E, Rajaeean N, Hinrichsmeyer KS, Domrös-Zoungrana D, Urban JN, Lenz C. Artificial Intelligence in Medical Affairs: A New Paradigm with Novel Opportunities. Pharmaceut Med 2024:10.1007/s40290-024-00536-9. [PMID: 39259426 DOI: 10.1007/s40290-024-00536-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/19/2024] [Indexed: 09/13/2024]
Abstract
The advent of artificial intelligence (AI) revolutionizes the ways of working in many areas of business and life science. In Medical Affairs (MA) departments of the pharmaceutical industry AI holds great potential for positively influencing the medical mission of identifying and addressing unmet medical needs and care gaps, and fostering solutions that improve the egalitarian and unbiased access of patients to treatments worldwide. Given the essential position of MA in corporate interactions with various healthcare stakeholders, AI offers broad possibilities to support strategic decision-making and to pioneer novel approaches in medical stakeholder interactions. By analyzing data derived from the healthcare environment and by streamlining operations in medical content generation, AI advances data-based prioritization and strategy execution. In this review, we discuss promising AI-based solutions in MA that support the effective use of heterogenous information from observations of the healthcare environment, the enhancement of medical education, and the analysis of real-world data. For a successful implementation of such solutions, specific considerations partly unique to healthcare must be taken care of, for example, transparency, data privacy, healthcare regulations, and in predictive applications, explainability.
Collapse
Affiliation(s)
- Emma Fröling
- Pfizer Pharma GmbH, Friedrichstraße 110, 10117, Berlin, Germany.
| | - Neda Rajaeean
- Pfizer Pharma GmbH, Friedrichstraße 110, 10117, Berlin, Germany
| | | | | | | | - Christian Lenz
- Pfizer Pharma GmbH, Friedrichstraße 110, 10117, Berlin, Germany
| |
Collapse
|
3
|
Li Y, Zhao J, Li M, Dang Y, Yu E, Li J, Sun Z, Hussein U, Wen J, Abdelhameed AM, Mai J, Li S, Yu Y, Hu X, Yang D, Feng J, Li Z, He J, Tao W, Duan T, Lou Y, Li F, Tao C. RefAI: a GPT-powered retrieval-augmented generative tool for biomedical literature recommendation and summarization. J Am Med Inform Assoc 2024; 31:2030-2039. [PMID: 38857454 PMCID: PMC11339508 DOI: 10.1093/jamia/ocae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 05/11/2024] [Accepted: 05/21/2024] [Indexed: 06/12/2024] Open
Abstract
OBJECTIVES Precise literature recommendation and summarization are crucial for biomedical professionals. While the latest iteration of generative pretrained transformer (GPT) incorporates 2 distinct modes-real-time search and pretrained model utilization-it encounters challenges in dealing with these tasks. Specifically, the real-time search can pinpoint some relevant articles but occasionally provides fabricated papers, whereas the pretrained model excels in generating well-structured summaries but struggles to cite specific sources. In response, this study introduces RefAI, an innovative retrieval-augmented generative tool designed to synergize the strengths of large language models (LLMs) while overcoming their limitations. MATERIALS AND METHODS RefAI utilized PubMed for systematic literature retrieval, employed a novel multivariable algorithm for article recommendation, and leveraged GPT-4 turbo for summarization. Ten queries under 2 prevalent topics ("cancer immunotherapy and target therapy" and "LLMs in medicine") were chosen as use cases and 3 established counterparts (ChatGPT-4, ScholarAI, and Gemini) as our baselines. The evaluation was conducted by 10 domain experts through standard statistical analyses for performance comparison. RESULTS The overall performance of RefAI surpassed that of the baselines across 5 evaluated dimensions-relevance and quality for literature recommendation, accuracy, comprehensiveness, and reference integration for summarization, with the majority exhibiting statistically significant improvements (P-values <.05). DISCUSSION RefAI demonstrated substantial improvements in literature recommendation and summarization over existing tools, addressing issues like fabricated papers, metadata inaccuracies, restricted recommendations, and poor reference integration. CONCLUSION By augmenting LLM with external resources and a novel ranking algorithm, RefAI is uniquely capable of recommending high-quality literature and generating well-structured summaries, holding the potential to meet the critical needs of biomedical professionals in navigating and synthesizing vast amounts of scientific literature.
Collapse
Affiliation(s)
- Yiming Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Jeff Zhao
- Department of Computer Science, College of Natural Sciences, University of Texas at Austin, Austin, TX 78712, United States
| | - Manqi Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
- Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Yifang Dang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Evan Yu
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Jianfu Li
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, United States
| | - Zenan Sun
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Usama Hussein
- Department of Lymphoma and Myeloma, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Jianguo Wen
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Ahmed M Abdelhameed
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, United States
| | - Junhua Mai
- Department of Nanomedicine, Houston Methodist Academic Institute, Houston, TX 77030, United States
| | - Shenduo Li
- Division of Hematology and Oncology, Department of Medicine, Mayo Clinic, Jacksonville, FL 32224, United States
| | - Yue Yu
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, United States
| | - Xinyue Hu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, United States
| | - Daowei Yang
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Jingna Feng
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, United States
| | - Zehan Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Jianping He
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Wei Tao
- Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Tiehang Duan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, United States
| | - Yanyan Lou
- Division of Hematology and Oncology, Department of Medicine, Mayo Clinic, Jacksonville, FL 32224, United States
| | - Fang Li
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, United States
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, United States
| |
Collapse
|
4
|
Omar M, Brin D, Glicksberg B, Klang E. Utilizing natural language processing and large language models in the diagnosis and prediction of infectious diseases: A systematic review. Am J Infect Control 2024; 52:992-1001. [PMID: 38588980 DOI: 10.1016/j.ajic.2024.03.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/26/2024] [Accepted: 03/27/2024] [Indexed: 04/10/2024]
Abstract
BACKGROUND Natural Language Processing (NLP) and Large Language Models (LLMs) hold largely untapped potential in infectious disease management. This review explores their current use and uncovers areas needing more attention. METHODS This analysis followed systematic review procedures, registered with the Prospective Register of Systematic Reviews. We conducted a search across major databases including PubMed, Embase, Web of Science, and Scopus, up to December 2023, using keywords related to NLP, LLM, and infectious diseases. We also employed the Quality Assessment of Diagnostic Accuracy Studies-2 tool for evaluating the quality and robustness of the included studies. RESULTS Our review identified 15 studies with diverse applications of NLP in infectious disease management. Notable examples include GPT-4's application in detecting urinary tract infections and BERTweet's use in Lyme Disease surveillance through social media analysis. These models demonstrated effective disease monitoring and public health tracking capabilities. However, the effectiveness varied across studies. For instance, while some NLP tools showed high accuracy in pneumonia detection and high sensitivity in identifying invasive mold diseases from medical reports, others fell short in areas like bloodstream infection management. CONCLUSIONS This review highlights the yet-to-be-fully-realized promise of NLP and LLMs in infectious disease management. It calls for more exploration to fully harness AI's capabilities, particularly in the areas of diagnosis, surveillance, predicting disease courses, and tracking epidemiological trends.
Collapse
Affiliation(s)
- Mahmud Omar
- Tel-aviv university, Faculty of medicine, Tel-Aviv, Israel.
| | - Dana Brin
- Division of Diagnostic Imaging, Sheba Medical Center, Affiliated to Tel-Aviv University, Ramat Gan, Israel
| | - Benjamin Glicksberg
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY; The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY
| | - Eyal Klang
- The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY
| |
Collapse
|
5
|
Kikuchi T, Nakao T, Nakamura Y, Hanaoka S, Mori H, Yoshikawa T. Toward Improved Radiologic Diagnostics: Investigating the Utility and Limitations of GPT-3.5 Turbo and GPT-4 with Quiz Cases. AJNR Am J Neuroradiol 2024:ajnr.A8332. [PMID: 38719605 DOI: 10.3174/ajnr.a8332] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 05/03/2024] [Indexed: 08/17/2024]
Abstract
BACKGROUND AND PURPOSE The rise of large language models such as generative pretrained transformers (GPTs) has sparked considerable interest in radiology, especially in interpreting radiologic reports and image findings. While existing research has focused on GPTs estimating diagnoses from radiologic descriptions, exploring alternative diagnostic information sources is also crucial. This study introduces the use of GPTs (GPT-3.5 Turbo and GPT-4) for information retrieval and summarization, searching relevant case reports via PubMed, and investigates their potential to aid diagnosis. MATERIALS AND METHODS From October 2021 to December 2023, we selected 115 cases from the "Case of the Week" series on the American Journal of Neuroradiology website. Their Description and Legend sections were presented to the GPTs for the 2 tasks. For the Direct Diagnosis task, the models provided 3 differential diagnoses that were considered correct if they matched the diagnosis in the diagnosis section. For the Case Report Search task, the models generated 2 keywords per case, creating PubMed search queries to extract up to 3 relevant reports. A response was considered correct if reports containing the disease name stated in the diagnosis section were extracted. The McNemar test was used to evaluate whether adding a Case Report Search to Direct Diagnosis improved overall accuracy. RESULTS In the Direct Diagnosis task, GPT-3.5 Turbo achieved a correct response rate of 26% (30/115 cases), whereas GPT-4 achieved 41% (47/115). For the Case Report Search task, GPT-3.5 Turbo scored 10% (11/115), and GPT-4 scored 7% (8/115). Correct responses totaled 32% (37/115) with 3 overlapping cases for GPT-3.5 Turbo, whereas GPT-4 had 43% (50/115) of correct responses with 5 overlapping cases. Adding Case Report Search improved GPT-3.5 Turbo's performance (P = .023) but not that of GPT-4 (P = .248). CONCLUSIONS The effectiveness of adding Case Report Search to GPT-3.5 Turbo was particularly pronounced, suggesting its potential as an alternative diagnostic approach to GPTs, particularly in scenarios where direct diagnoses from GPTs are not obtainable. Nevertheless, the overall performance of GPT models in both direct diagnosis and case report retrieval tasks remains not optimal, and users should be aware of their limitations.
Collapse
Affiliation(s)
- Tomohiro Kikuchi
- From the Departments of Computational Diagnostic Radiology and Preventive Medicine (T.K., T.N., Y.N., T.Y.), The University of Tokyo Hospital, Tokyo, Japan
- Department of Radiology (T.K., H.M.), School of Medicine, Jichi Medical University, Shimotsuke, Tochigi, Japan
| | - Takahiro Nakao
- From the Departments of Computational Diagnostic Radiology and Preventive Medicine (T.K., T.N., Y.N., T.Y.), The University of Tokyo Hospital, Tokyo, Japan
| | - Yuta Nakamura
- From the Departments of Computational Diagnostic Radiology and Preventive Medicine (T.K., T.N., Y.N., T.Y.), The University of Tokyo Hospital, Tokyo, Japan
| | - Shouhei Hanaoka
- Departments of Radiology (S.H.), The University of Tokyo Hospital, Tokyo, Japan
| | - Harushi Mori
- Department of Radiology (T.K., H.M.), School of Medicine, Jichi Medical University, Shimotsuke, Tochigi, Japan
| | - Takeharu Yoshikawa
- From the Departments of Computational Diagnostic Radiology and Preventive Medicine (T.K., T.N., Y.N., T.Y.), The University of Tokyo Hospital, Tokyo, Japan
| |
Collapse
|
6
|
Wang Y, Liang L, Li R, Wang Y, Hao C. Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control. J Multidiscip Healthc 2024; 17:3917-3929. [PMID: 39155977 PMCID: PMC11330241 DOI: 10.2147/jmdh.s473680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Accepted: 07/25/2024] [Indexed: 08/20/2024] Open
Abstract
Purpose Chatbots, which are based on large language models, are increasingly being used in public health. However, the effectiveness of chatbot responses has been debated, and their performance in myopia prevention and control has not been fully explored. This study aimed to evaluate the effectiveness of three well-known chatbots-ChatGPT, Claude, and Bard-in responding to public health questions about myopia. Methods Nineteen public health questions about myopia (including three topics of policy, basics and measures) were responded individually by three chatbots. After shuffling the order, each chatbot response was independently rated by 4 raters for comprehensiveness, accuracy and relevance. Results The study's questions have undergone reliable testing. There was a significant difference among the word count responses of all 3 chatbots. From most to least, the order was ChatGPT, Bard, and Claude. All 3 chatbots had a composite score above 4 out of 5. ChatGPT scored the highest in all aspects of the assessment. However, all chatbots exhibit shortcomings, such as giving fabricated responses. Conclusion Chatbots have shown great potential in public health, with ChatGPT being the best. The future use of chatbots as a public health tool will require rapid development of standards for their use and monitoring, as well as continued research, evaluation and improvement of chatbots.
Collapse
Affiliation(s)
- Yan Wang
- Department of Child and Adolescent Health, School of Public Health, Zhengzhou University, Zhengzhou, Henan, People’s Republic of China
| | - Lihua Liang
- Primary and Secondary School Health Center, Zhengzhou Education Science Planning and Evaluation Center, Zhengzhou Municipal Education Bureau, Zhengzhou, Henan, People’s Republic of China
| | - Ran Li
- Primary and Secondary School Health Center, Zhengzhou Education Science Planning and Evaluation Center, Zhengzhou Municipal Education Bureau, Zhengzhou, Henan, People’s Republic of China
| | - Yihua Wang
- Institute of Science and Technology Information, Zhengzhou University, Zhengzhou, Henan, People’s Republic of China
| | - Changfu Hao
- Department of Child and Adolescent Health, School of Public Health, Zhengzhou University, Zhengzhou, Henan, People’s Republic of China
| |
Collapse
|
7
|
Hieronimus B, Hammann S, Podszun MC. Can the AI tools ChatGPT and Bard generate energy, macro- and micro-nutrient sufficient meal plans for different dietary patterns? Nutr Res 2024; 128:105-114. [PMID: 39102765 DOI: 10.1016/j.nutres.2024.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 07/07/2024] [Accepted: 07/07/2024] [Indexed: 08/07/2024]
Abstract
Artificial intelligence chatbots based on large language models have recently emerged as an alternative to traditional online searches and are also entering the nutrition space. In this study, we wanted to investigate whether the artificial intelligence chatbots ChatGPT and Bard (now Gemini) can create meal plans that meet the dietary reference intake (DRI) for different dietary patterns. We further hypothesized that nutritional adequacy could be improved by modifying the prompts used. Meal plans were generated by 3 accounts for different dietary patterns (omnivorous, vegetarian, and vegan) using 2 distinct prompts resulting in 108 meal plans total. The nutrient content of the plans was subsequently analyzed and compared to the DRIs. On average, the meal plans contained less energy and carbohydrates but mostly exceeded the DRI for protein. Vitamin D and fluoride fell below the DRI for all plans, whereas only the vegan plans contained insufficient vitamin B12. ChatGPT suggested using vitamin B12 supplements in 5 of 18 instances, whereas Bard never recommended supplements. There were no significant differences between the prompts or the tools. Although the meal plans generated by ChatGPT and Bard met most DRIs, there were some exceptions, particularly for vegan diets. These tools maybe useful for individuals looking for general dietary inspiration, but they should not be relied on to create nutritionally adequate meal plans, especially for individuals with restrictive dietary needs.
Collapse
Affiliation(s)
- Bettina Hieronimus
- Max Rubner-Institut, Department of Physiology and Biochemistry of Nutrition, Karlsruhe, Germany
| | - Simon Hammann
- Department of Chemistry and Pharmacy, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany; Department of Food Chemistry and Analytical Chemistry (170a), Institute of Food Chemistry, University of Hohenheim, Stuttgart, Germany
| | - Maren C Podszun
- Institute of Nutritional Science, Department of Food Biofunctionality, University of Hohenheim, Stuttgart, Germany.
| |
Collapse
|
8
|
Aljamaan F, Temsah MH, Altamimi I, Al-Eyadhy A, Jamal A, Alhasan K, Mesallam TA, Farahat M, Malki KH. Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study. JMIR Med Inform 2024; 12:e54345. [PMID: 39083799 PMCID: PMC11325115 DOI: 10.2196/54345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/05/2024] [Accepted: 07/03/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) chatbots have recently gained use in medical practice by health care practitioners. Interestingly, the output of these AI chatbots was found to have varying degrees of hallucination in content and references. Such hallucinations generate doubts about their output and their implementation. OBJECTIVE The aim of our study was to propose a reference hallucination score (RHS) to evaluate the authenticity of AI chatbots' citations. METHODS Six AI chatbots were challenged with the same 10 medical prompts, requesting 10 references per prompt. The RHS is composed of 6 bibliographic items and the reference's relevance to prompts' keywords. RHS was calculated for each reference, prompt, and type of prompt (basic vs complex). The average RHS was calculated for each AI chatbot and compared across the different types of prompts and AI chatbots. RESULTS Bard failed to generate any references. ChatGPT 3.5 and Bing generated the highest RHS (score=11), while Elicit and SciSpace generated the lowest RHS (score=1), and Perplexity generated a middle RHS (score=7). The highest degree of hallucination was observed for reference relevancy to the prompt keywords (308/500, 61.6%), while the lowest was for reference titles (169/500, 33.8%). ChatGPT and Bing had comparable RHS (β coefficient=-0.069; P=.32), while Perplexity had significantly lower RHS than ChatGPT (β coefficient=-0.345; P<.001). AI chatbots generally had significantly higher RHS when prompted with scenarios or complex format prompts (β coefficient=0.486; P<.001). CONCLUSIONS The variation in RHS underscores the necessity for a robust reference evaluation tool to improve the authenticity of AI chatbots. Further, the variations highlight the importance of verifying their output and citations. Elicit and SciSpace had negligible hallucination, while ChatGPT and Bing had critical hallucination levels. The proposed AI chatbots' RHS could contribute to ongoing efforts to enhance AI's general reliability in medical research.
Collapse
Affiliation(s)
- Fadi Aljamaan
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | | | | | - Ayman Al-Eyadhy
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Amr Jamal
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Khalid Alhasan
- College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Tamer A Mesallam
- Department of Otolaryngology, College of Medicine, Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| | - Mohamed Farahat
- Department of Otolaryngology, College of Medicine, Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| | - Khalid H Malki
- Department of Otolaryngology, College of Medicine, Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
9
|
Anderl C, Klein SH, Sarigül B, Schneider FM, Han J, Fiedler PL, Utz S. Conversational presentation mode increases credibility judgements during information search with ChatGPT. Sci Rep 2024; 14:17127. [PMID: 39054335 PMCID: PMC11272919 DOI: 10.1038/s41598-024-67829-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 07/16/2024] [Indexed: 07/27/2024] Open
Abstract
People increasingly use large language model (LLM)-based conversational agents to obtain information. However, the information these models provide is not always factually accurate. Thus, it is critical to understand what helps users adequately assess the credibility of the provided information. Here, we report the results of two preregistered experiments in which participants rated the credibility of accurate versus partially inaccurate information ostensibly provided by a dynamic text-based LLM-powered agent, a voice-based agent, or a static text-based online encyclopedia. We found that people were better at detecting inaccuracies when identical information was provided as static text compared to both types of conversational agents, regardless of whether information search applications were branded (ChatGPT, Alexa, and Wikipedia) or unbranded. Mediation analysis overall corroborated the interpretation that a conversational nature poses a threat to adequate credibility judgments. Our research highlights the importance of presentation mode when dealing with misinformation.
Collapse
Affiliation(s)
- Christine Anderl
- Leibniz-Institut Für Wissensmedien (IWM), Schleichstraße 6, 72076, Tübingen, Germany.
| | - Stefanie H Klein
- Leibniz-Institut Für Wissensmedien (IWM), Schleichstraße 6, 72076, Tübingen, Germany
| | - Büsra Sarigül
- Leibniz-Institut Für Wissensmedien (IWM), Schleichstraße 6, 72076, Tübingen, Germany
| | - Frank M Schneider
- Leibniz-Institut Für Wissensmedien (IWM), Schleichstraße 6, 72076, Tübingen, Germany
- University of Amsterdam, Amsterdam, The Netherlands
| | - Junyi Han
- Leibniz-Institut Für Wissensmedien (IWM), Schleichstraße 6, 72076, Tübingen, Germany
| | - Paul L Fiedler
- Leibniz-Institut Für Wissensmedien (IWM), Schleichstraße 6, 72076, Tübingen, Germany
- Eberhard Karls Universität Tübingen, Tübingen, Germany
| | - Sonja Utz
- Leibniz-Institut Für Wissensmedien (IWM), Schleichstraße 6, 72076, Tübingen, Germany.
- Eberhard Karls Universität Tübingen, Tübingen, Germany.
| |
Collapse
|
10
|
Miao Y, Luo Y, Zhao Y, Li J, Liu M, Wang H, Chen Y, Wu Y. Performance of GPT-4 on Chinese Nursing Examination: Potentials for AI-Assisted Nursing Education Using Large Language Models. Nurse Educ 2024:00006223-990000000-00488. [PMID: 38981035 DOI: 10.1097/nne.0000000000001679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
BACKGROUND The performance of GPT-4 in nursing examinations within the Chinese context has not yet been thoroughly evaluated. OBJECTIVE To assess the performance of GPT-4 on multiple-choice and open-ended questions derived from nursing examinations in the Chinese context. METHODS The data sets of the Chinese National Nursing Licensure Examination spanning 2021 to 2023 were used to evaluate the accuracy of GPT-4 in multiple-choice questions. The performance of GPT-4 on open-ended questions was examined using 18 case-based questions. RESULTS For multiple-choice questions, GPT-4 achieved an accuracy of 71.0% (511/720). For open-ended questions, the responses were evaluated for cosine similarity, logical consistency, and information quality, all of which were found to be at a moderate level. CONCLUSION GPT-4 performed well at addressing queries on basic knowledge. However, it has notable limitations in answering open-ended questions. Nursing educators should weigh the benefits and challenges of GPT-4 for integration into nursing education.
Collapse
Affiliation(s)
- Yiqun Miao
- Author Affiliations: School of Nursing, Capital Medical University, Beijing, China (Drs Miao, Luo, Zhao, Li, Liu, Wang, and Wu); and School of Nursing, Johns Hopkins University, Baltimore, USA (Dr Chen)
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Liu Y, Ding X, Peng S, Zhang C. Leveraging ChatGPT to optimize depression intervention through explainable deep learning. Front Psychiatry 2024; 15:1383648. [PMID: 38903640 PMCID: PMC11188778 DOI: 10.3389/fpsyt.2024.1383648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 05/20/2024] [Indexed: 06/22/2024] Open
Abstract
Introduction Mental health issues bring a heavy burden to individuals and societies around the world. Recently, the large language model ChatGPT has demonstrated potential in depression intervention. The primary objective of this study was to ascertain the viability of ChatGPT as a tool for aiding counselors in their interactions with patients while concurrently evaluating its comparability to human-generated content (HGC). Methods We propose a novel framework that integrates state-of-the-art AI technologies, including ChatGPT, BERT, and SHAP, to enhance the accuracy and effectiveness of mental health interventions. ChatGPT generates responses to user inquiries, which are then classified using BERT to ensure the reliability of the content. SHAP is subsequently employed to provide insights into the underlying semantic constructs of the AI-generated recommendations, enhancing the interpretability of the intervention. Results Remarkably, our proposed methodology consistently achieved an impressive accuracy rate of 93.76%. We discerned that ChatGPT always employs a polite and considerate tone in its responses. It refrains from using intricate or unconventional vocabulary and maintains an impersonal demeanor. These findings underscore the potential significance of AIGC as an invaluable complementary component in enhancing conventional intervention strategies. Discussion This study illuminates the considerable promise offered by the utilization of large language models in the realm of healthcare. It represents a pivotal step toward advancing the development of sophisticated healthcare systems capable of augmenting patient care and counseling practices.
Collapse
Affiliation(s)
- Yang Liu
- School of Information Management, Wuhan University, Wuhan, China
- Shenzhen Research Institute, Wuhan University, Shenzhen, China
| | - Xingchen Ding
- School of Cyber Science and Engineering, Wuhan University, Wuhan, China
| | - Shun Peng
- School of Education, Jianghan University, Wuhan, China
| | - Chengzhi Zhang
- Department of Information Management, Nanjing University of Science and Technology, Nanjing, China
| |
Collapse
|
12
|
Daraqel B, Wafaie K, Mohammed H, Cao L, Mheissen S, Liu Y, Zheng L. The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard. Am J Orthod Dentofacial Orthop 2024; 165:652-662. [PMID: 38493370 DOI: 10.1016/j.ajodo.2024.01.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/01/2024] [Accepted: 01/01/2024] [Indexed: 03/18/2024]
Abstract
INTRODUCTION This study aimed to evaluate and compare the performance of 2 artificial intelligence (AI) models, Chat Generative Pretrained Transformer-3.5 (ChatGPT-3.5; OpenAI, San Francisco, Calif) and Google Bidirectional Encoder Representations from Transformers (Google Bard; Bard Experiment, Google, Mountain View, Calif), in terms of response accuracy, completeness, generation time, and response length when answering general orthodontic questions. METHODS A team of orthodontic specialists developed a set of 100 questions in 10 orthodontic domains. One author submitted the questions to both ChatGPT and Google Bard. The AI-generated responses from both models were randomly assigned into 2 forms and sent to 5 blinded and independent assessors. The quality of AI-generated responses was evaluated using a newly developed tool for accuracy of information and completeness. In addition, response generation time and length were recorded. RESULTS The accuracy and completeness of responses were high in both AI models. The median accuracy score was 9 (interquartile range [IQR]: 8-9) for ChatGPT and 8 (IQR: 8-9) for Google Bard (Median difference: 1; P <0.001). The median completeness score was similar in both models, with 8 (IQR: 8-9) for ChatGPT and 8 (IQR: 7-9) for Google Bard. The odds of accuracy and completeness were higher by 31% and 23% in ChatGPT than in Google Bard. Google Bard's response generation time was significantly shorter than that of ChatGPT by 10.4 second/question. However, both models were similar in terms of response length generation. CONCLUSIONS Both ChatGPT and Google Bard generated responses were rated with a high level of accuracy and completeness to the posed general orthodontic questions. However, acquiring answers was generally faster using the Google Bard model.
Collapse
Affiliation(s)
- Baraa Daraqel
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China; Oral Health Research and Promotion Unit, Al-Quds University, Jerusalem, Palestine.
| | - Khaled Wafaie
- Department of Orthodontics, Faculty of Dentistry, First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | | | - Li Cao
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | | | - Yang Liu
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | - Leilei Zheng
- Department of Orthodontics, Stomatological Hospital of Chongqing Medical University Chongqing Key Laboratory of Oral Disease and Biomedical Sciences Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China.
| |
Collapse
|
13
|
Burnette H, Pabani A, von Itzstein MS, Switzer B, Fan R, Ye F, Puzanov I, Naidoo J, Ascierto PA, Gerber DE, Ernstoff MS, Johnson DB. Use of artificial intelligence chatbots in clinical management of immune-related adverse events. J Immunother Cancer 2024; 12:e008599. [PMID: 38816231 PMCID: PMC11141185 DOI: 10.1136/jitc-2023-008599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2024] [Indexed: 06/01/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) chatbots have become a major source of general and medical information, though their accuracy and completeness are still being assessed. Their utility to answer questions surrounding immune-related adverse events (irAEs), common and potentially dangerous toxicities from cancer immunotherapy, are not well defined. METHODS We developed 50 distinct questions with answers in available guidelines surrounding 10 irAE categories and queried two AI chatbots (ChatGPT and Bard), along with an additional 20 patient-specific scenarios. Experts in irAE management scored answers for accuracy and completion using a Likert scale ranging from 1 (least accurate/complete) to 4 (most accurate/complete). Answers across categories and across engines were compared. RESULTS Overall, both engines scored highly for accuracy (mean scores for ChatGPT and Bard were 3.87 vs 3.5, p<0.01) and completeness (3.83 vs 3.46, p<0.01). Scores of 1-2 (completely or mostly inaccurate or incomplete) were particularly rare for ChatGPT (6/800 answer-ratings, 0.75%). Of the 50 questions, all eight physician raters gave ChatGPT a rating of 4 (fully accurate or complete) for 22 questions (for accuracy) and 16 questions (for completeness). In the 20 patient scenarios, the average accuracy score was 3.725 (median 4) and the average completeness was 3.61 (median 4). CONCLUSIONS AI chatbots provided largely accurate and complete information regarding irAEs, and wildly inaccurate information ("hallucinations") was uncommon. However, until accuracy and completeness increases further, appropriate guidelines remain the gold standard to follow.
Collapse
Affiliation(s)
- Hannah Burnette
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Aliyah Pabani
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland, USA
| | - Mitchell S von Itzstein
- Harold C Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Benjamin Switzer
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA
| | - Run Fan
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Fei Ye
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Igor Puzanov
- Department of Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA
| | | | - Paolo A Ascierto
- Department of Melanoma, Cancer Immunotherapy and Development Therapeutics, Istituto Nazionale Tumori IRCCS Fondazione Pascale, Napoli, Campania, Italy
| | - David E Gerber
- Harold C Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - Marc S Ernstoff
- ImmunoOncology Branch (IOB), Developmental Therapeutics Program, Cancer Therapy and Diagnosis Division, National Cancer Institute (NCI), National Institutes of Health, Bethesda, Maryland, USA
| | - Douglas B Johnson
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
14
|
Garbarino S, Bragazzi NL. Evaluating the effectiveness of artificial intelligence-based tools in detecting and understanding sleep health misinformation: Comparative analysis using Google Bard and OpenAI ChatGPT-4. J Sleep Res 2024:e14210. [PMID: 38577714 DOI: 10.1111/jsr.14210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 03/26/2024] [Accepted: 03/28/2024] [Indexed: 04/06/2024]
Abstract
This study evaluates the performance of two major artificial intelligence-based tools (ChatGPT-4 and Google Bard) in debunking sleep-related myths. More in detail, the present research assessed 20 sleep misconceptions using a 5-point Likert scale for falseness and public health significance, comparing responses of artificial intelligence tools with expert opinions. The results indicated that Google Bard correctly identified 19 out of 20 statements as false (95.0% accuracy), not differing from ChatGPT-4 (85.0% accuracy, Fisher's exact test p = 0.615). Google Bard's ratings of the falseness of the sleep misconceptions averaged 4.25 ± 0.70, showing a moderately negative skewness (-0.42) and kurtosis (-0.83), and suggesting a distribution with fewer extreme values compared with ChatGPT-4. In assessing public health significance, Google Bard's mean score was 2.4 ± 0.80, with skewness and kurtosis of 0.36 and -0.07, respectively, indicating a more normal distribution compared with ChatGPT-4. The inter-rater agreement between Google Bard and sleep experts had an intra-class correlation coefficient of 0.58 for falseness and 0.69 for public health significance, showing moderate alignment (p = 0.065 and p = 0.014, respectively). Text-mining analysis revealed Google Bard's focus on practical advice, while ChatGPT-4 concentrated on theoretical aspects of sleep. The readability analysis suggested Google Bard's responses were more accessible, aligning with 8th-grade level material, versus ChatGPT-4's 12th-grade level complexity. The study demonstrates the potential of artificial intelligence in public health education, especially in sleep health, and underscores the importance of accurate, reliable artificial intelligence-generated information, calling for further collaboration between artificial intelligence developers, sleep health professionals and educators to enhance the effectiveness of sleep health promotion.
Collapse
Affiliation(s)
- Sergio Garbarino
- Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics and Maternal, Child Sciences (DINOGMI), University of Genoa, Genoa, Italy
- Post-Graduate School of Occupational Health, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Nicola Luigi Bragazzi
- Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics and Maternal, Child Sciences (DINOGMI), University of Genoa, Genoa, Italy
- Laboratory for Industrial and Applied Mathematics (LIAM), Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada
- Human Nutrition Unit (HNU), Department of Food and Drugs, University of Parma, Parma, Italy
| |
Collapse
|
15
|
Zangrossi P, Martini M, Guerrini F, DE Bonis P, Spena G. Large language model, AI and scientific research: why ChatGPT is only the beginning. J Neurosurg Sci 2024; 68:216-224. [PMID: 38261307 DOI: 10.23736/s0390-5616.23.06171-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
ChatGPT, a conversational artificial intelligence model based on the generative pre-trained transformer GPT architecture, has garnered widespread attention due to its user-friendly nature and diverse capabilities. This technology enables users of all backgrounds to effortlessly engage in human-like conversations and receive coherent and intelligible responses. Beyond casual interactions, ChatGPT offers compelling prospects for scientific research, facilitating tasks like literature review and content summarization, ultimately expediting and enhancing the academic writing process. Still, in the field of medicine and surgery, it has already shown its endless potential in many tasks (enhancing decision-making processes, aiding in surgical planning and simulation, providing real-time assistance during surgery, improving postoperative care and rehabilitation, contributing to training, education, research, and development). However, it is crucial to acknowledge the model's limitations, encompassing knowledge constraints and the potential for erroneous responses, as well as ethical and legal considerations. This paper explores the potential benefits and pitfalls of these innovative technologies in scientific research, shedding light on their transformative impact while addressing concerns surrounding their use.
Collapse
Affiliation(s)
- Pietro Zangrossi
- Department of Neurosurgery, Sant'Anna University Hospital, Ferrara, Italy -
- Department of Translational Medicine, University of Ferrara, Ferrara, Italy -
| | - Massimo Martini
- R&D Department, Gate-away.com, Grottammare, Ascoli Piceno, Italy
| | - Francesco Guerrini
- Department of Neurosurgery, San Matteo Polyclinic IRCCS Foundation, Pavia, Italy
| | - Pasquale DE Bonis
- Department of Neurosurgery, Sant'Anna University Hospital, Ferrara, Italy
- Department of Translational Medicine, University of Ferrara, Ferrara, Italy
- Unit of Minimally Invasive Neurosurgery, Ferrara University Hospital, Ferrara, Italy
| | - Giannantonio Spena
- Department of Neurosurgery, San Matteo Polyclinic IRCCS Foundation, Pavia, Italy
| |
Collapse
|
16
|
Raman R, Venugopalan M, Kamal A. Evaluating human resources management literacy: A performance analysis of ChatGPT and bard. Heliyon 2024; 10:e27026. [PMID: 38486738 PMCID: PMC10937570 DOI: 10.1016/j.heliyon.2024.e27026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 02/16/2024] [Accepted: 02/22/2024] [Indexed: 03/17/2024] Open
Abstract
This study presents a comprehensive analysis comparing the literacy levels of two Generative Artificial Intelligence (GAI) tools, ChatGPT and Bard, using a dataset of 134 questions from the Human Resources (HR) domain. The generated responses are evaluated for accuracy, relevance, and clarity. We find that ChatGPT outperforms Bard in overall accuracy (84.3% vs. 82.8%). This difference in performance suggests that ChatGPT could serve as a robotic advisor in transactional HR roles. In contrast, Bard may possess additional safeguards against misuse in the HR function, making it less capable of generating responses to certain types of questions. Statistical tests reveal that although the two systems differ in their mean accuracy, relevance, and clarity of the responses, the observed differences are not always statistically significant, implying that both tools may be more complementary than competitive. The Pearson correlation coefficients further support this by showing weak to non-existent relationships in performance metrics between the two tools. Confirmation queries don't improve ChatGPT or Bard's response accuracy. The study thus contributes to emerging research on the utility of GAI tools in Human Resources Management and suggests that involving certified HR professionals in the design phase could enhance underlying language model performance.
Collapse
Affiliation(s)
- Raghu Raman
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, India
| | - Murale Venugopalan
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, India
| | - Anju Kamal
- Amrita School of Business, Amrita Vishwa Vidyapeetham, Amritapuri, India
| |
Collapse
|
17
|
Tortora L. Beyond Discrimination: Generative AI Applications and Ethical Challenges in Forensic Psychiatry. Front Psychiatry 2024; 15:1346059. [PMID: 38525252 PMCID: PMC10958425 DOI: 10.3389/fpsyt.2024.1346059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/31/2024] [Indexed: 03/26/2024] Open
Abstract
The advent and growing popularity of generative artificial intelligence (GenAI) holds the potential to revolutionise AI applications in forensic psychiatry and criminal justice, which traditionally relied on discriminative AI algorithms. Generative AI models mark a significant shift from the previously prevailing paradigm through their ability to generate seemingly new realistic data and analyse and integrate a vast amount of unstructured content from different data formats. This potential extends beyond reshaping conventional practices, like risk assessment, diagnostic support, and treatment and rehabilitation plans, to creating new opportunities in previously underexplored areas, such as training and education. This paper examines the transformative impact of generative artificial intelligence on AI applications in forensic psychiatry and criminal justice. First, it introduces generative AI and its prevalent models. Following this, it reviews the current applications of discriminative AI in forensic psychiatry. Subsequently, it presents a thorough exploration of the potential of generative AI to transform established practices and introduce novel applications through multimodal generative models, data generation and data augmentation. Finally, it provides a comprehensive overview of ethical and legal issues associated with deploying generative AI models, focusing on their impact on individuals as well as their broader societal implications. In conclusion, this paper aims to contribute to the ongoing discourse concerning the dynamic challenges of generative AI applications in forensic contexts, highlighting potential opportunities, risks, and challenges. It advocates for interdisciplinary collaboration and emphasises the necessity for thorough, responsible evaluations of generative AI models before widespread adoption into domains where decisions with substantial life-altering consequences are routinely made.
Collapse
Affiliation(s)
- Leda Tortora
- School of Nursing and Midwifery, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
18
|
Aizenstein H, Moore RC, Vahia I, Ciarleglio A. Deep Learning and Geriatric Mental Health. Am J Geriatr Psychiatry 2024; 32:270-279. [PMID: 38142162 PMCID: PMC10922602 DOI: 10.1016/j.jagp.2023.11.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/24/2023] [Accepted: 11/27/2023] [Indexed: 12/25/2023]
Abstract
The goal of this overview is to help clinicians develop basic proficiency with the terminology of deep learning and understand its fundamentals and early applications. We describe what machine learning and deep learning represent and explain the underlying data science principles. We also review current promising applications and identify ethical issues that bear consideration. Deep Learning is a new type of machine learning that is remarkably good at finding patterns in data, and in some cases generating realistic new data. We provide insights into how deep learning works and discuss its relevance to geriatric psychiatry.
Collapse
Affiliation(s)
- Howard Aizenstein
- Department of Psychiatry (HA), University of Pittsburgh School of Medicine, Pittsburgh, PA.
| | - Raeanne C Moore
- Department of Psychiatry (RCM), University of California San Diego, San Diego, CA
| | - Ipsit Vahia
- Division of Geriatric Psychiatry (IV), Harvard Medical School, Boston, MA
| | - Adam Ciarleglio
- Department of Biostatistics and Bioinformatics (AC), George Washington University, Washington, D.C
| |
Collapse
|
19
|
Seckel E, Stephens BY, Rodriguez F. Ten simple rules to leverage large language models for getting grants. PLoS Comput Biol 2024; 20:e1011863. [PMID: 38427611 PMCID: PMC10906892 DOI: 10.1371/journal.pcbi.1011863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2024] Open
Affiliation(s)
- Elizabeth Seckel
- Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, United States of America
| | - Brandi Y. Stephens
- Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, United States of America
| | - Fatima Rodriguez
- Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California, United States of America
| |
Collapse
|
20
|
McMahon HV, McMahon BD. Automating untruths: ChatGPT, self-managed medication abortion, and the threat of misinformation in a post- Roe world. Front Digit Health 2024; 6:1287186. [PMID: 38419805 PMCID: PMC10900507 DOI: 10.3389/fdgth.2024.1287186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/26/2024] [Indexed: 03/02/2024] Open
Abstract
Background ChatGPT is a generative artificial intelligence chatbot that uses natural language processing to understand and execute prompts in a human-like manner. While the chatbot has become popular as a source of information among the public, experts have expressed concerns about the number of false and misleading statements made by ChatGPT. Many people search online for information about self-managed medication abortion, which has become even more common following the overturning of Roe v. Wade. It is likely that ChatGPT is also being used as a source of this information; however, little is known about its accuracy. Objective To assess the accuracy of ChatGPT responses to common questions regarding self-managed abortion safety and the process of using abortion pills. Methods We prompted ChatGPT with 65 questions about self-managed medication abortion, which produced approximately 11,000 words of text. We qualitatively coded all data in MAXQDA and performed thematic analysis. Results ChatGPT responses correctly described clinician-managed medication abortion as both safe and effective. In contrast, self-managed medication abortion was inaccurately described as dangerous and associated with an increase in the risk of complications, which was attributed to the lack of clinician supervision. Conclusion ChatGPT repeatedly provided responses that overstated the risk of complications associated with self-managed medication abortion in ways that directly contradict the expansive body of evidence demonstrating that self-managed medication abortion is both safe and effective. The chatbot's tendency to perpetuate health misinformation and associated stigma regarding self-managed medication abortions poses a threat to public health and reproductive autonomy.
Collapse
Affiliation(s)
- Hayley V. McMahon
- Department of Behavioral, Social, and Health Education Sciences, Emory University Rollins School of Public Health, Atlanta, GA, United States
- The Center forReproductive Health Research in the Southeast, Emory University Rollins School of Public Health, Atlanta, GA, United States
| | | |
Collapse
|
21
|
Younis HA, Eisa TAE, Nasser M, Sahib TM, Noor AA, Alyasiri OM, Salisu S, Hayder IM, Younis HA. A Systematic Review and Meta-Analysis of Artificial Intelligence Tools in Medicine and Healthcare: Applications, Considerations, Limitations, Motivation and Challenges. Diagnostics (Basel) 2024; 14:109. [PMID: 38201418 PMCID: PMC10802884 DOI: 10.3390/diagnostics14010109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/02/2023] [Accepted: 12/04/2023] [Indexed: 01/12/2024] Open
Abstract
Artificial intelligence (AI) has emerged as a transformative force in various sectors, including medicine and healthcare. Large language models like ChatGPT showcase AI's potential by generating human-like text through prompts. ChatGPT's adaptability holds promise for reshaping medical practices, improving patient care, and enhancing interactions among healthcare professionals, patients, and data. In pandemic management, ChatGPT rapidly disseminates vital information. It serves as a virtual assistant in surgical consultations, aids dental practices, simplifies medical education, and aids in disease diagnosis. A total of 82 papers were categorised into eight major areas, which are G1: treatment and medicine, G2: buildings and equipment, G3: parts of the human body and areas of the disease, G4: patients, G5: citizens, G6: cellular imaging, radiology, pulse and medical images, G7: doctors and nurses, and G8: tools, devices and administration. Balancing AI's role with human judgment remains a challenge. A systematic literature review using the PRISMA approach explored AI's transformative potential in healthcare, highlighting ChatGPT's versatile applications, limitations, motivation, and challenges. In conclusion, ChatGPT's diverse medical applications demonstrate its potential for innovation, serving as a valuable resource for students, academics, and researchers in healthcare. Additionally, this study serves as a guide, assisting students, academics, and researchers in the field of medicine and healthcare alike.
Collapse
Affiliation(s)
- Hussain A. Younis
- College of Education for Women, University of Basrah, Basrah 61004, Iraq
| | | | - Maged Nasser
- Computer & Information Sciences Department, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia;
| | - Thaeer Mueen Sahib
- Kufa Technical Institute, Al-Furat Al-Awsat Technical University, Kufa 54001, Iraq;
| | - Ameen A. Noor
- Computer Science Department, College of Education, University of Almustansirya, Baghdad 10045, Iraq;
| | | | - Sani Salisu
- Department of Information Technology, Federal University Dutse, Dutse 720101, Nigeria;
| | - Israa M. Hayder
- Qurna Technique Institute, Southern Technical University, Basrah 61016, Iraq;
| | - Hameed AbdulKareem Younis
- Department of Cybersecurity, College of Computer Science and Information Technology, University of Basrah, Basrah 61016, Iraq;
| |
Collapse
|
22
|
Cohen F, Vallimont J, Gelfand AA. Caution regarding fabricated citations from artificial intelligence. Headache 2024; 64:3-4. [PMID: 37873980 DOI: 10.1111/head.14649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 10/10/2023] [Indexed: 10/25/2023]
Affiliation(s)
- Fred Cohen
- Headache Editorial Team
- Department of Neurology, Mount Sinai Hospital, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Medicine, Mount Sinai Hospital, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | | | - Amy A Gelfand
- Headache Editorial Team
- Department of Neurology, Child & Adolescent Headache Program, UCSF, San Francisco, California, USA
| |
Collapse
|
23
|
Boonrit N, Chaisawat K, Phueakong C, Nootong N, Ruanglertboon W. Exploring community pharmacists' attitudes in Thailand towards ChatGPT usage: A pilot qualitative investigation. Digit Health 2024; 10:20552076241283256. [PMID: 39314814 PMCID: PMC11418248 DOI: 10.1177/20552076241283256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 08/28/2024] [Indexed: 09/25/2024] Open
Abstract
Background ChatGPT has recently emerged as a disruptive technology, potentially impacting various societal dimensions, including pharmacy practices. In Thailand, community pharmacists are navigating transitions as patients increasingly rely on digital tools for healthcare recommendations. This study explores the attitudes of community pharmacists in Hatyai, one of Thailand's most populated cities, towards the integration of ChatGPT in pharmacy services. Method ChatGPT-3.5 was used to generate responses to three questions concerning the use of medicine in special populations in the Thai language. These responses were then incorporated into a questionnaire and evaluated using a Likert scale from 1 to 5. Participants who consented were asked to rate the responses and participate in an in-depth interview. Results The majority of participants rated the responses favorably, with scores of 4 and 5 accounting for at least 60% of the ratings. Only a small proportion of responses received doubtful ratings (score of 3) or was in disagreement, ranging from 20% to 40%. Moreover, open opinions extracted from the interviews suggested that participants viewed ChatGPT as a capable assistant, as it provided fast yet reasonably accurate information in the Thai language. Conclusion The findings indicate that community pharmacists view ChatGPT as a capable assistant, albeit noting the need for further refinements. The study underscores the importance for pharmacists to proactively adapt to technological advancements, particularly those affecting patient safety, to enhance healthcare delivery and optimize treatment outcomes.
Collapse
Affiliation(s)
- Nuntapong Boonrit
- Department of Clinical Pharmacy, Faculty of Pharmaceutical Sciences, Prince of Songkla University, Hatyai, Songkhla, Thailand
| | - Kornchanok Chaisawat
- Department of Clinical Pharmacy, Faculty of Pharmaceutical Sciences, Prince of Songkla University, Hatyai, Songkhla, Thailand
| | - Chanakarn Phueakong
- Department of Clinical Pharmacy, Faculty of Pharmaceutical Sciences, Prince of Songkla University, Hatyai, Songkhla, Thailand
| | - Nantitha Nootong
- Department of Clinical Pharmacy, Faculty of Pharmaceutical Sciences, Prince of Songkla University, Hatyai, Songkhla, Thailand
| | - Warit Ruanglertboon
- Discipline of Pharmacology, Division of Health and Applied Sciences, Faculty of Science, Prince of Songkla University, Hatyai, Songkhla, Thailand
| |
Collapse
|
24
|
Bhayana R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology 2024; 310:e232756. [PMID: 38226883 DOI: 10.1148/radiol.232756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Although chatbots have existed for decades, the emergence of transformer-based large language models (LLMs) has captivated the world through the most recent wave of artificial intelligence chatbots, including ChatGPT. Transformers are a type of neural network architecture that enables better contextual understanding of language and efficient training on massive amounts of unlabeled data, such as unstructured text from the internet. As LLMs have increased in size, their improved performance and emergent abilities have revolutionized natural language processing. Since language is integral to human thought, applications based on LLMs have transformative potential in many industries. In fact, LLM-based chatbots have demonstrated human-level performance on many professional benchmarks, including in radiology. LLMs offer numerous clinical and research applications in radiology, several of which have been explored in the literature with encouraging results. Multimodal LLMs can simultaneously interpret text and images to generate reports, closely mimicking current diagnostic pathways in radiology. Thus, from requisition to report, LLMs have the opportunity to positively impact nearly every step of the radiology journey. Yet, these impressive models are not without limitations. This article reviews the limitations of LLMs and mitigation strategies, as well as potential uses of LLMs, including multimodal models. Also reviewed are existing LLM-based applications that can enhance efficiency in supervised settings.
Collapse
Affiliation(s)
- Rajesh Bhayana
- From University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Mount Sinai Hospital, and Women's College Hospital, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Munk Bldg, 1st Fl, Toronto, ON, Canada M5G 24C
| |
Collapse
|
25
|
Razdan S, Siegal AR, Brewer Y, Sljivich M, Valenzuela RJ. Assessing ChatGPT's ability to answer questions pertaining to erectile dysfunction: can our patients trust it? Int J Impot Res 2023:10.1038/s41443-023-00797-z. [PMID: 37985815 DOI: 10.1038/s41443-023-00797-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 10/28/2023] [Accepted: 11/06/2023] [Indexed: 11/22/2023]
Abstract
Erectile dysfunction (ED) is a disorder that can cause distress and shame for men suffering from it. Men with ED will often turn to online support and chat groups to ask intimate questions about their health. ChatGPT is an artificial intelligence (AI)-based software that has been trained to engage in conversation with human input. We sought to assess the accuracy, readability, and reproducibility of ChatGPT's responses to frequently asked questions regarding the diagnosis, management, and care of patients with ED. Questions pertaining to ED were derived from clinic encounters with patients as well as online chat forums. These were entered into the free ChatGPT version 3.5 during the month of August 2023. Questions were asked on two separate days from unique accounts and computers to prevent the software from memorizing responses linked to a specific user. A total of 35 questions were asked. Outcomes measured were accuracy using grading from board certified urologists, readability with the Gunning Fog Index, and reproducibility by comparing responses between days. For epidemiology of disease, the percentage of responses that were graded as "comprehensive" or "correct but inadequate" was 100% across both days. There was fair reproducibility and median readability of 15.9 (IQR 2.5). For treatment and prevention, the percentage of responses that were graded as "comprehensive" or "correct but inadequate" was 78.9%. There was poor reproducibility of responses with a median readability of 14.5 (IQR 4.0). Risks of treatment and counseling both had 100% of questions graded as "comprehensive" or "correct but inadequate." The readability score for risks of treatment was median 13.9 (IQR 1.1) and for counseling median 13.8 (IQR 0.5), with good reproducibility for both question domains. ChatGPT provides accurate answers to common patient questions pertaining to ED, although its understanding of treatment options is incomplete and responses are at a reading level too advanced for the average patient.
Collapse
Affiliation(s)
- Shirin Razdan
- Department of Urology, Icahn School of Medicine at Mount Sinai Hospital, New York, NY, 10029, USA.
| | - Alexandra R Siegal
- Department of Urology, Icahn School of Medicine at Mount Sinai Hospital, New York, NY, 10029, USA
| | - Yukiko Brewer
- Department of Internal Medicine, HCA Florida Sarasota Doctors Hospital, Sarasota, FL, 34233, USA
| | - Michaela Sljivich
- Department of Urology, Icahn School of Medicine at Mount Sinai Hospital, New York, NY, 10029, USA
| | - Robert J Valenzuela
- Department of Urology, Icahn School of Medicine at Mount Sinai Hospital, New York, NY, 10029, USA
| |
Collapse
|
26
|
Arillotta D, Floresta G, Guirguis A, Corkery JM, Catalani V, Martinotti G, Sensi SL, Schifano F. GLP-1 Receptor Agonists and Related Mental Health Issues; Insights from a Range of Social Media Platforms Using a Mixed-Methods Approach. Brain Sci 2023; 13:1503. [PMID: 38002464 PMCID: PMC10669484 DOI: 10.3390/brainsci13111503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 10/16/2023] [Accepted: 10/23/2023] [Indexed: 11/26/2023] Open
Abstract
The emergence of glucagon-like peptide-1 receptor agonists (GLP-1 RAs; semaglutide and others) now promises effective, non-invasive treatment of obesity for individuals with and without diabetes. Social media platforms' users started promoting semaglutide/Ozempic as a weight-loss treatment, and the associated increase in demand has contributed to an ongoing worldwide shortage of the drug associated with levels of non-prescribed semaglutide intake. Furthermore, recent reports emphasized some GLP-1 RA-associated risks of triggering depression and suicidal thoughts. Consistent with the above, we aimed to assess the possible impact of GLP-1 RAs on mental health as being perceived and discussed in popular open platforms with the help of a mixed-methods approach. Reddit posts yielded 12,136 comments, YouTube videos 14,515, and TikTok videos 17,059, respectively. Out of these posts/entries, most represented matches related to sleep-related issues, including insomnia (n = 620 matches); anxiety (n = 353); depression (n = 204); and mental health issues in general (n = 165). After the initiation of GLP-1 RAs, losing weight was associated with either a marked improvement or, in some cases, a deterioration, in mood; increase/decrease in anxiety/insomnia; and better control of a range of addictive behaviors. The challenges of accessing these medications were a hot topic as well. To the best of our knowledge, this is the first study documenting if and how GLP-1 RAs are perceived as affecting mood, mental health, and behaviors. Establishing a clear cause-and-effect link between metabolic diseases, depression and medications is difficult because of their possible reciprocal relationship, shared underlying mechanisms and individual differences. Further research is needed to better understand the safety profile of these molecules and their putative impact on behavioral and non-behavioral addictions.
Collapse
Affiliation(s)
- Davide Arillotta
- School of Clinical Pharmacology and Toxicology, University of Florence, 50121 Florence, Italy;
- Psychopharmacology, Drug Misuse and Novel Psychoactive Substances Research Unit, School of Life and Medical Sciences, University of Hertfordshire, Hatfield AL10 9AB, UK; (G.F.); (A.G.); (J.M.C.); (V.C.); (G.M.)
| | - Giuseppe Floresta
- Psychopharmacology, Drug Misuse and Novel Psychoactive Substances Research Unit, School of Life and Medical Sciences, University of Hertfordshire, Hatfield AL10 9AB, UK; (G.F.); (A.G.); (J.M.C.); (V.C.); (G.M.)
- Department of Drug and Health Sciences, University of Catania, 95124 Catania, Italy
| | - Amira Guirguis
- Psychopharmacology, Drug Misuse and Novel Psychoactive Substances Research Unit, School of Life and Medical Sciences, University of Hertfordshire, Hatfield AL10 9AB, UK; (G.F.); (A.G.); (J.M.C.); (V.C.); (G.M.)
- Pharmacy, Swansea University Medical School, Faculty of Medicine, Health and Life Science, Swansea University, Swansea SA2 8PP, UK
| | - John Martin Corkery
- Psychopharmacology, Drug Misuse and Novel Psychoactive Substances Research Unit, School of Life and Medical Sciences, University of Hertfordshire, Hatfield AL10 9AB, UK; (G.F.); (A.G.); (J.M.C.); (V.C.); (G.M.)
| | - Valeria Catalani
- Psychopharmacology, Drug Misuse and Novel Psychoactive Substances Research Unit, School of Life and Medical Sciences, University of Hertfordshire, Hatfield AL10 9AB, UK; (G.F.); (A.G.); (J.M.C.); (V.C.); (G.M.)
| | - Giovanni Martinotti
- Psychopharmacology, Drug Misuse and Novel Psychoactive Substances Research Unit, School of Life and Medical Sciences, University of Hertfordshire, Hatfield AL10 9AB, UK; (G.F.); (A.G.); (J.M.C.); (V.C.); (G.M.)
- Department of Neurosciences, Imaging and Clinical Sciences, University of Chieti-Pescara, 66100 Chieti, Italy;
| | - Stefano L. Sensi
- Department of Neurosciences, Imaging and Clinical Sciences, University of Chieti-Pescara, 66100 Chieti, Italy;
- Center for Advanced Studies and Technology (CAST), Institute of Advanced Biomedical Technology (ITAB), University of Chieti-Pescara, Via dei Vestini 21, 66100 Chieti, Italy
| | - Fabrizio Schifano
- Psychopharmacology, Drug Misuse and Novel Psychoactive Substances Research Unit, School of Life and Medical Sciences, University of Hertfordshire, Hatfield AL10 9AB, UK; (G.F.); (A.G.); (J.M.C.); (V.C.); (G.M.)
| |
Collapse
|
27
|
Koga S. The Integration of Large Language Models Such as ChatGPT in Scientific Writing: Harnessing Potential and Addressing Pitfalls. Korean J Radiol 2023; 24:924-925. [PMID: 37634646 PMCID: PMC10462902 DOI: 10.3348/kjr.2023.0738] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 08/08/2023] [Indexed: 08/29/2023] Open
Affiliation(s)
- Shunsuke Koga
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
28
|
Emsley R. ChatGPT: these are not hallucinations - they're fabrications and falsifications. SCHIZOPHRENIA (HEIDELBERG, GERMANY) 2023; 9:52. [PMID: 37598184 PMCID: PMC10439949 DOI: 10.1038/s41537-023-00379-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 07/18/2023] [Indexed: 08/21/2023]
|