1
|
Suárez A, Jiménez J, Llorente de Pedro M, Andreu-Vázquez C, Díaz-Flores García V, Gómez Sánchez M, Freire Y. Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery. Comput Struct Biotechnol J 2024; 24:46-52. [PMID: 38162955 PMCID: PMC10755495 DOI: 10.1016/j.csbj.2023.11.058] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 11/28/2023] [Accepted: 11/28/2023] [Indexed: 01/03/2024] Open
Abstract
AI has revolutionized the way we interact with technology. Noteworthy advances in AI algorithms and large language models (LLM) have led to the development of natural generative language (NGL) systems such as ChatGPT. Although these LLM can simulate human conversations and generate content in real time, they face challenges related to the topicality and accuracy of the information they generate. This study aimed to assess whether ChatGPT-4 could provide accurate and reliable answers to general dentists in the field of oral surgery, and thus explore its potential as an intelligent virtual assistant in clinical decision making in oral surgery. Thirty questions related to oral surgery were posed to ChatGPT4, each question repeated 30 times. Subsequently, a total of 900 responses were obtained. Two surgeons graded the answers according to the guidelines of the Spanish Society of Oral Surgery, using a three-point Likert scale (correct, partially correct/incomplete, and incorrect). Disagreements were arbitrated by an experienced oral surgeon, who provided the final grade Accuracy was found to be 71.7%, and consistency of the experts' grading across iterations, ranged from moderate to almost perfect. ChatGPT-4, with its potential capabilities, will inevitably be integrated into dental disciplines, including oral surgery. In the future, it could be considered as an auxiliary intelligent virtual assistant, though it would never replace oral surgery experts. Proper training and verified information by experts will remain vital to the implementation of the technology. More comprehensive research is needed to ensure the safe and successful application of AI in oral surgery.
Collapse
Affiliation(s)
- Ana Suárez
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Jaime Jiménez
- Department of Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - María Llorente de Pedro
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Cristina Andreu-Vázquez
- Department of Veterinary Medicine, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Víctor Díaz-Flores García
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Margarita Gómez Sánchez
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Yolanda Freire
- Department of Pre-Clinic Dentistry, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Calle Tajo s/n, Villaviciosa de Odón, 28670 Madrid, Spain
| |
Collapse
|
2
|
Warrier A, Singh R, Haleem A, Zaki H, Eloy JA. The Comparative Diagnostic Capability of Large Language Models in Otolaryngology. Laryngoscope 2024; 134:3997-4002. [PMID: 38563415 DOI: 10.1002/lary.31434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/05/2024] [Accepted: 03/21/2024] [Indexed: 04/04/2024]
Abstract
OBJECTIVES Evaluate and compare the ability of large language models (LLMs) to diagnose various ailments in otolaryngology. METHODS We collected all 100 clinical vignettes from the second edition of Otolaryngology Cases-The University of Cincinnati Clinical Portfolio by Pensak et al. With the addition of the prompt "Provide a diagnosis given the following history," we prompted ChatGPT-3.5, Google Bard, and Bing-GPT4 to provide a diagnosis for each vignette. These diagnoses were compared to the portfolio for accuracy and recorded. All queries were run in June 2023. RESULTS ChatGPT-3.5 was the most accurate model (89% success rate), followed by Google Bard (82%) and Bing GPT (74%). A chi-squared test revealed a significant difference between the three LLMs in providing correct diagnoses (p = 0.023). Of the 100 vignettes, seven require additional testing results (i.e., biopsy, non-contrast CT) for accurate clinical diagnosis. When omitting these vignettes, the revised success rates were 95.7% for ChatGPT-3.5, 88.17% for Google Bard, and 78.72% for Bing-GPT4 (p = 0.002). CONCLUSIONS ChatGPT-3.5 offers the most accurate diagnoses when given established clinical vignettes as compared to Google Bard and Bing-GPT4. LLMs may accurately offer assessments for common otolaryngology conditions but currently require detailed prompt information and critical supervision from clinicians. There is vast potential in the clinical applicability of LLMs; however, practitioners should be wary of possible "hallucinations" and misinformation in responses. LEVEL OF EVIDENCE 3 Laryngoscope, 134:3997-4002, 2024.
Collapse
Affiliation(s)
- Akshay Warrier
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Rohan Singh
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Afash Haleem
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Haider Zaki
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| | - Jean Anderson Eloy
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
- Center for Skull Base and Pituitary Surgery, Neurological Institute of New Jersey, Rutgers New Jersey Medical School, Newark, New Jersey, U.S.A
| |
Collapse
|
3
|
Langston E, Charness N, Boot W. Are Virtual Assistants Trustworthy for Medicare Information: An Examination of Accuracy and Reliability. THE GERONTOLOGIST 2024; 64:gnae062. [PMID: 38832398 PMCID: PMC11258897 DOI: 10.1093/geront/gnae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND AND OBJECTIVES Advances in artificial intelligence (AI)-based virtual assistants provide a potential opportunity for older adults to use this technology in the context of health information-seeking. Meta-analysis on trust in AI shows that users are influenced by the accuracy and reliability of the AI trustee. We evaluated these dimensions for responses to Medicare queries. RESEARCH DESIGN AND METHODS During the summer of 2023, we assessed the accuracy and reliability of Alexa, Google Assistant, Bard, and ChatGPT-4 on Medicare terminology and general content from a large, standardized question set. We compared the accuracy of these AI systems to that of a large representative sample of Medicare beneficiaries who were queried twenty years prior. RESULTS Alexa and Google Assistant were found to be highly inaccurate when compared to beneficiaries' mean accuracy of 68.4% on terminology queries and 53.0% on general Medicare content. Bard and ChatGPT-4 answered Medicare terminology queries perfectly and performed much better on general Medicare content queries (Bard = 96.3%, ChatGPT-4 = 92.6%) than the average Medicare beneficiary. About one month to a month-and-a-half later, we found that Bard and Alexa's accuracy stayed the same, whereas ChatGPT-4's performance nominally decreased, and Google Assistant's performance nominally increased. DISCUSSION AND IMPLICATIONS LLM-based assistants generate trustworthy information in response to carefully phrased queries about Medicare, in contrast to Alexa and Google Assistant. Further studies will be needed to determine what factors beyond accuracy and reliability influence the adoption and use of such technology for Medicare decision-making.
Collapse
Affiliation(s)
- Emily Langston
- Department of Psychology, Florida State University, Tallahassee, Florida, USA
| | - Neil Charness
- Department of Psychology, Florida State University, Tallahassee, Florida, USA
| | - Walter Boot
- Department of Psychology, Florida State University, Tallahassee, Florida, USA
| |
Collapse
|
4
|
Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, Kiuchi T. Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis. J Med Internet Res 2024; 26:e60807. [PMID: 39052324 PMCID: PMC11310649 DOI: 10.2196/60807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 06/11/2024] [Accepted: 06/15/2024] [Indexed: 07/27/2024] Open
Abstract
BACKGROUND Over the past 2 years, researchers have used various medical licensing examinations to test whether ChatGPT (OpenAI) possesses accurate medical knowledge. The performance of each version of ChatGPT on the medical licensing examination in multiple environments showed remarkable differences. At this stage, there is still a lack of a comprehensive understanding of the variability in ChatGPT's performance on different medical licensing examinations. OBJECTIVE In this study, we reviewed all studies on ChatGPT performance in medical licensing examinations up to March 2024. This review aims to contribute to the evolving discourse on artificial intelligence (AI) in medical education by providing a comprehensive analysis of the performance of ChatGPT in various environments. The insights gained from this systematic review will guide educators, policymakers, and technical experts to effectively and judiciously use AI in medical education. METHODS We searched the literature published between January 1, 2022, and March 29, 2024, by searching query strings in Web of Science, PubMed, and Scopus. Two authors screened the literature according to the inclusion and exclusion criteria, extracted data, and independently assessed the quality of the literature concerning Quality Assessment of Diagnostic Accuracy Studies-2. We conducted both qualitative and quantitative analyses. RESULTS A total of 45 studies on the performance of different versions of ChatGPT in medical licensing examinations were included in this study. GPT-4 achieved an overall accuracy rate of 81% (95% CI 78-84; P<.01), significantly surpassing the 58% (95% CI 53-63; P<.01) accuracy rate of GPT-3.5. GPT-4 passed the medical examinations in 26 of 29 cases, outperforming the average scores of medical students in 13 of 17 cases. Translating the examination questions into English improved GPT-3.5's performance but did not affect GPT-4. GPT-3.5 showed no difference in performance between examinations from English-speaking and non-English-speaking countries (P=.72), but GPT-4 performed better on examinations from English-speaking countries significantly (P=.02). Any type of prompt could significantly improve GPT-3.5's (P=.03) and GPT-4's (P<.01) performance. GPT-3.5 performed better on short-text questions than on long-text questions. The difficulty of the questions affected the performance of GPT-3.5 and GPT-4. In image-based multiple-choice questions (MCQs), ChatGPT's accuracy rate ranges from 13.1% to 100%. ChatGPT performed significantly worse on open-ended questions than on MCQs. CONCLUSIONS GPT-4 demonstrates considerable potential for future use in medical education. However, due to its insufficient accuracy, inconsistent performance, and the challenges posed by differing medical policies and knowledge across countries, GPT-4 is not yet suitable for use in medical education. TRIAL REGISTRATION PROSPERO CRD42024506687; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=506687.
Collapse
Affiliation(s)
- Mingxin Liu
- Department of Health Communication, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Tsuyoshi Okuhara
- Department of Health Communication, School of Public Health, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - XinYi Chang
- Department of Industrial Engineering and Economics, School of Engineering, Tokyo Institute of Technology, Tokyo, Japan
| | - Ritsuko Shirabe
- Department of Health Communication, School of Public Health, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yuriko Nishiie
- Department of Health Communication, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Hiroko Okada
- Department of Health Communication, School of Public Health, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Takahiro Kiuchi
- Department of Health Communication, School of Public Health, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
5
|
Keyßer G, Pfeil A, Reuß-Borst M, Frohne I, Schultz O, Sander O. [What is the potential of ChatGPT for qualified patient information? : Attempt of a structured analysis on the basis of a survey regarding complementary and alternative medicine (CAM) in rheumatology]. Z Rheumatol 2024:10.1007/s00393-024-01535-6. [PMID: 38985176 DOI: 10.1007/s00393-024-01535-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/18/2024] [Indexed: 07/11/2024]
Abstract
INTRODUCTION The chatbot ChatGPT represents a milestone in the interaction between humans and large databases that are accessible via the internet. It facilitates the answering of complex questions by enabling a communication in everyday language. Therefore, it is a potential source of information for those who are affected by rheumatic diseases. The aim of our investigation was to find out whether ChatGPT (version 3.5) is capable of giving qualified answers regarding the application of specific methods of complementary and alternative medicine (CAM) in three rheumatic diseases: rheumatoid arthritis (RA), systemic lupus erythematosus (SLE) and granulomatosis with polyangiitis (GPA). In addition, it was investigated how the answers of the chatbot were influenced by the wording of the question. METHODS The questioning of ChatGPT was performed in three parts. Part A consisted of an open question regarding the best way of treatment of the respective disease. In part B, the questions were directed towards possible indications for the application of CAM in general in one of the three disorders. In part C, the chatbot was asked for specific recommendations regarding one of three CAM methods: homeopathy, ayurvedic medicine and herbal medicine. Questions in parts B and C were expressed in two modifications: firstly, it was asked whether the specific CAM was applicable at all in certain rheumatic diseases. The second question asked which procedure of the respective CAM method worked best in the specific disease. The validity of the answers was checked by using the ChatGPT reliability score, a Likert scale ranging from 1 (lowest validity) to 7 (highest validity). RESULTS The answers to the open questions of part A had the highest validity. In parts B and C, ChatGPT suggested a variety of CAM applications that lacked scientific evidence. The validity of the answers depended on the wording of the questions. If the question suggested the inclination to apply a certain CAM, the answers often lacked the information of missing evidence and were graded with lower score values. CONCLUSION The answers of ChatGPT (version 3.5) regarding the applicability of CAM in selected rheumatic diseases are not convincingly based on scientific evidence. In addition, the wording of the questions affects the validity of the information. Currently, an uncritical application of ChatGPT as an instrument for patient information cannot be recommended.
Collapse
Affiliation(s)
- Gernot Keyßer
- Klinik und Poliklinik für Innere Medizin II, Universitätsklinikum Halle, Ernst-Grube-Str. 40, 06120, Halle (Saale), Deutschland.
| | - Alexander Pfeil
- Klinik für Innere Medizin III, Universitätsklinikum Jena, Friedrich-Schiller-Universität Jena, Jena, Deutschland
| | | | - Inna Frohne
- Privatpraxis für Rheumatologie, Essen, Deutschland
| | - Olaf Schultz
- Abteilung Rheumatologie, ACURA Kliniken Baden-Baden, Baden-Baden, Deutschland
| | - Oliver Sander
- Klinik für Rheumatologie, Universitätsklinikum Düsseldorf, Düsseldorf, Deutschland
| |
Collapse
|
6
|
Huo W, He M, Zeng Z, Bao X, Lu Y, Tian W, Feng J, Feng R. Impact Analysis of COVID-19 Pandemic on Hospital Reviews on Dianping Website in Shanghai, China: Empirical Study. J Med Internet Res 2024; 26:e52992. [PMID: 38954461 PMCID: PMC11252617 DOI: 10.2196/52992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/24/2024] [Accepted: 05/21/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND In the era of the internet, individuals have increasingly accustomed themselves to gathering necessary information and expressing their opinions on public web-based platforms. The health care sector is no exception, as these comments, to a certain extent, influence people's health care decisions. During the onset of the COVID-19 pandemic, how the medical experience of Chinese patients and their evaluations of hospitals have changed remains to be studied. Therefore, we plan to collect patient medical visit data from the internet to reflect the current status of medical relationships under specific circumstances. OBJECTIVE This study aims to explore the differences in patient comments across various stages (during, before, and after) of the COVID-19 pandemic, as well as among different types of hospitals (children's hospitals, maternity hospitals, and tumor hospitals). Additionally, by leveraging ChatGPT (OpenAI), the study categorizes the elements of negative hospital evaluations. An analysis is conducted on the acquired data, and potential solutions that could improve patient satisfaction are proposed. This study is intended to assist hospital managers in providing a better experience for patients who are seeking care amid an emergent public health crisis. METHODS Selecting the top 50 comprehensive hospitals nationwide and the top specialized hospitals (children's hospitals, tumor hospitals, and maternity hospitals), we collected patient reviews from these hospitals on the Dianping website. Using ChatGPT, we classified the content of negative reviews. Additionally, we conducted statistical analysis using SPSS (IBM Corp) to examine the scoring and composition of negative evaluations. RESULTS A total of 30,317 pieces of effective comment information were collected from January 1, 2018, to August 15, 2023, including 7696 pieces of negative comment information. Manual inspection results indicated that ChatGPT had an accuracy rate of 92.05%. The F1-score was 0.914. The analysis of this data revealed a significant correlation between the comments and ratings received by hospitals during the pandemic. Overall, there was a significant increase in average comment scores during the outbreak (P<.001). Furthermore, there were notable differences in the composition of negative comments among different types of hospitals (P<.001). Children's hospitals received sensitive feedback regarding waiting times and treatment effectiveness, while patients at maternity hospitals showed a greater concern for the attitude of health care providers. Patients at tumor hospitals expressed a desire for timely examinations and treatments, especially during the pandemic period. CONCLUSIONS The COVID-19 pandemic had some association with patient comment scores. There were variations in the scores and content of comments among different types of specialized hospitals. Using ChatGPT to analyze patient comment content represents an innovative approach for statistically assessing factors contributing to patient dissatisfaction. The findings of this study could provide valuable insights for hospital administrators to foster more harmonious physician-patient relationships and enhance hospital performance during public health emergencies.
Collapse
Affiliation(s)
- Weixue Huo
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Mengwei He
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Zhaoxiang Zeng
- Department of Vascular Surgery, Changhai Hospital, Navy Medical University, Shanghai, China
| | - Xianhao Bao
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Ye Lu
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Wen Tian
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Jiaxuan Feng
- Vascular Surgery Department, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Rui Feng
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| |
Collapse
|
7
|
Kumar RP, Sivan V, Bachir H, Sarwar SA, Ruzicka F, O'Malley GR, Lobo P, Morales IC, Cassimatis ND, Hundal JS, Patel NV. Can Artificial Intelligence Mitigate Missed Diagnoses by Generating Differential Diagnoses for Neurosurgeons? World Neurosurg 2024; 187:e1083-e1088. [PMID: 38759788 DOI: 10.1016/j.wneu.2024.05.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 05/19/2024]
Abstract
BACKGROUND/OBJECTIVE Neurosurgery emphasizes the criticality of accurate differential diagnoses, with diagnostic delays posing significant health and economic challenges. As large language models (LLMs) emerge as transformative tools in healthcare, this study seeks to elucidate their role in assisting neurosurgeons with the differential diagnosis process, especially during preliminary consultations. METHODS This study employed 3 chat-based LLMs, ChatGPT (versions 3.5 and 4.0), Perplexity AI, and Bard AI, to evaluate their diagnostic accuracy. Each LLM was prompted using clinical vignettes, and their responses were recorded to generate differential diagnoses for 20 common and uncommon neurosurgical disorders. Disease-specific prompts were crafted using Dynamed, a clinical reference tool. The accuracy of the LLMs was determined based on their ability to identify the target disease within their top differential diagnoses correctly. RESULTS For the initial differential, ChatGPT 3.5 achieved an accuracy of 52.63%, while ChatGPT 4.0 performed slightly better at 53.68%. Perplexity AI and Bard AI demonstrated 40.00% and 29.47% accuracy, respectively. As the number of considered differentials increased from 2 to 5, ChatGPT 3.5 reached its peak accuracy of 77.89% for the top 5 differentials. Bard AI and Perplexity AI had varied performances, with Bard AI improving in the top 5 differentials at 62.11%. On a disease-specific note, the LLMs excelled in diagnosing conditions like epilepsy and cervical spine stenosis but faced challenges with more complex diseases such as Moyamoya disease and amyotrophic lateral sclerosis. CONCLUSIONS LLMs showcase the potential to enhance diagnostic accuracy and decrease the incidence of missed diagnoses in neurosurgery.
Collapse
Affiliation(s)
- Rohit Prem Kumar
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA.
| | - Vijay Sivan
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Hanin Bachir
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Syed A Sarwar
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Francis Ruzicka
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Geoffrey R O'Malley
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Paulo Lobo
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Ilona Cazorla Morales
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Nicholas D Cassimatis
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Jasdeep S Hundal
- Department of Neurology, HMH-Jersey Shore University Medical Center, Neptune, New Jersey, USA
| | - Nitesh V Patel
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA; Department of Neurosurgery, HMH-Jersey Shore University Medical Center, Neptune, New Jersey, USA
| |
Collapse
|
8
|
Rotem R, Zamstein O, Rottenstreich M, O'Sullivan OE, O'reilly BA, Weintraub AY. The future of patient education: A study on AI-driven responses to urinary incontinence inquiries. Int J Gynaecol Obstet 2024. [PMID: 38944693 DOI: 10.1002/ijgo.15751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/30/2024] [Accepted: 06/14/2024] [Indexed: 07/01/2024]
Abstract
OBJECTIVE To evaluate the effectiveness of ChatGPT in providing insights into common urinary incontinence concerns within urogynecology. By analyzing the model's responses against established benchmarks of accuracy, completeness, and safety, the study aimed to quantify its usefulness for informing patients and aiding healthcare providers. METHODS An expert-driven questionnaire was developed, inviting urogynecologists worldwide to assess ChatGPT's answers to 10 carefully selected questions on urinary incontinence (UI). These assessments focused on the accuracy of the responses, their comprehensiveness, and whether they raised any safety issues. Subsequent statistical analyses determined the average consensus among experts and identified the proportion of responses receiving favorable evaluations (a score of 4 or higher). RESULTS Of 50 urogynecologists that were approached worldwide, 37 responded, offering insights into ChatGPT's responses on UI. The overall feedback averaged a score of 4.0, indicating a positive acceptance. Accuracy scores averaged 3.9 with 71% rated favorably, whereas comprehensiveness scored slightly higher at 4 with 74% favorable ratings. Safety assessments also averaged 4 with 74% favorable responses. CONCLUSION This investigation underlines ChatGPT's favorable performance across the evaluated domains of accuracy, comprehensiveness, and safety within the context of UI queries. However, despite this broadly positive reception, the study also signals a clear avenue for improvement, particularly in the precision of the provided information. Refining ChatGPT's accuracy and ensuring the delivery of more pinpointed responses are essential steps forward, aiming to bolster its utility as a comprehensive educational resource for patients and a supportive tool for healthcare practitioners.
Collapse
Affiliation(s)
- Reut Rotem
- Department of Urogynaecology, Cork University Maternity Hospital, Cork, Ireland
- Department of Obstetrics and Gynecology, Shaare Zedek Medical Center, Affiliated with the Hebrew University School of Medicine, Jerusalem, Israel
| | - Omri Zamstein
- Department of Obstetrics and Gynecology, Soroka University Medical Center, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Misgav Rottenstreich
- Department of Obstetrics and Gynecology, Shaare Zedek Medical Center, Affiliated with the Hebrew University School of Medicine, Jerusalem, Israel
| | | | - Barry A O'reilly
- Department of Urogynaecology, Cork University Maternity Hospital, Cork, Ireland
| | - Adi Y Weintraub
- Department of Obstetrics and Gynecology, Soroka University Medical Center, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| |
Collapse
|
9
|
Yao JJ, Aggarwal M, Lopez RD, Namdari S. Current Concepts Review: Large Language Models in Orthopaedics: Definitions, Uses, and Limitations. J Bone Joint Surg Am 2024:00004623-990000000-01136. [PMID: 38896652 DOI: 10.2106/jbjs.23.01417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
➤ Large language models are a subset of artificial intelligence. Large language models are powerful tools that excel in natural language text processing and generation.➤ There are many potential clinical, research, and educational applications of large language models in orthopaedics, but the development of these applications needs to be focused on patient safety and the maintenance of high standards.➤ There are numerous methodological, ethical, and regulatory concerns with regard to the use of large language models. Orthopaedic surgeons need to be aware of the controversies and advocate for an alignment of these models with patient and caregiver priorities.
Collapse
Affiliation(s)
- Jie J Yao
- Rothman Orthopaedic Institute, Thomas Jefferson University, Philadelphia, Pennsylvania
| | | | - Ryan D Lopez
- Rothman Orthopaedic Institute, Thomas Jefferson University, Philadelphia, Pennsylvania
| | - Surena Namdari
- Rothman Orthopaedic Institute, Thomas Jefferson University, Philadelphia, Pennsylvania
| |
Collapse
|
10
|
Lim B, Seth I, Cuomo R, Kenney PS, Ross RJ, Sofiadellis F, Pentangelo P, Ceccaroni A, Alfano C, Rozen WM. Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients. Aesthetic Plast Surg 2024:10.1007/s00266-024-04157-0. [PMID: 38898239 DOI: 10.1007/s00266-024-04157-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2024] [Accepted: 05/21/2024] [Indexed: 06/21/2024]
Abstract
BACKGROUND Abdominoplasty is a common operation, used for a range of cosmetic and functional issues, often in the context of divarication of recti, significant weight loss, and after pregnancy. Despite this, patient-surgeon communication gaps can hinder informed decision-making. The integration of large language models (LLMs) in healthcare offers potential for enhancing patient information. This study evaluated the feasibility of using LLMs for answering perioperative queries. METHODS This study assessed the efficacy of four leading LLMs-OpenAI's ChatGPT-3.5, Anthropic's Claude, Google's Gemini, and Bing's CoPilot-using fifteen unique prompts. All outputs were evaluated using the Flesch-Kincaid, Flesch Reading Ease score, and Coleman-Liau index for readability assessment. The DISCERN score and a Likert scale were utilized to evaluate quality. Scores were assigned by two plastic surgical residents and then reviewed and discussed until a consensus was reached by five plastic surgeon specialists. RESULTS ChatGPT-3.5 required the highest level for comprehension, followed by Gemini, Claude, then CoPilot. Claude provided the most appropriate and actionable advice. In terms of patient-friendliness, CoPilot outperformed the rest, enhancing engagement and information comprehensiveness. ChatGPT-3.5 and Gemini offered adequate, though unremarkable, advice, employing more professional language. CoPilot uniquely included visual aids and was the only model to use hyperlinks, although they were not very helpful and acceptable, and it faced limitations in responding to certain queries. CONCLUSION ChatGPT-3.5, Gemini, Claude, and Bing's CoPilot showcased differences in readability and reliability. LLMs offer unique advantages for patient care but require careful selection. Future research should integrate LLM strengths and address weaknesses for optimal patient education. LEVEL OF EVIDENCE V This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
Collapse
Affiliation(s)
- Bryan Lim
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Ishith Seth
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Roberto Cuomo
- Plastic Surgery Unit, Department of Medicine, Surgery and Neuroscience, University of Siena, Siena, Italy.
| | - Peter Sinkjær Kenney
- Department of Plastic Surgery, Velje Hospital, Beriderbakken 4, 7100, Vejle, Denmark
- Department of Plastic and Breast Surgery, Aarhus University Hospital, Aarhus, Denmark
| | - Richard J Ross
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | - Foti Sofiadellis
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| | | | | | | | - Warren Matthew Rozen
- Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia
| |
Collapse
|
11
|
MohanaSundaram A, Patil B, Praticò D. ChatGPT's Inconsistency in the Diagnosis of Alzheimer's Disease. J Alzheimers Dis Rep 2024; 8:923-925. [PMID: 38910941 PMCID: PMC11191643 DOI: 10.3233/adr-240069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 05/04/2024] [Indexed: 06/25/2024] Open
Abstract
A recent article by El Haj et al. provided evidence that ChatGPT could be a potential tool that complements the clinical diagnosis of various stages of Alzheimer's Disease (AD) as well as mild cognitive impairment (MCI). To reassess the accuracy and reproducibility of ChatGPT in the diagnosis of AD and MCI, we used the same prompt used by the authors. Surprisingly, we found that some of the responses of ChatGPT in the diagnoses of various stages of AD and MCI were different. In this commentary we discuss the possible reasons for these different results and propose strategies for future studies.
Collapse
Affiliation(s)
| | - Bhushan Patil
- MannSparsh Neuropsychiatric Hospital, Kalyan, India
- Manasa Rehabilitation and De-Addiction Center, Titwala, India
| | - Domenico Praticò
- Alzheimer’s Center at Temple, Lewis Katz School of Medicine, Temple University, Philadelphia, PA, USA
| |
Collapse
|
12
|
Croxford E, Gao Y, Patterson B, To D, Tesch S, Dligach D, Mayampurath A, Churpek MM, Afshar M. Development of a Human Evaluation Framework and Correlation with Automated Metrics for Natural Language Generation of Medical Diagnoses. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.20.24304620. [PMID: 38562730 PMCID: PMC10984060 DOI: 10.1101/2024.03.20.24304620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
In the evolving landscape of clinical Natural Language Generation (NLG), assessing abstractive text quality remains challenging, as existing methods often overlook generative task complexities. This work aimed to examine the current state of automated evaluation metrics in NLG in healthcare. To have a robust and well-validated baseline with which to examine the alignment of these metrics, we created a comprehensive human evaluation framework. Employing ChatGPT-3.5-turbo generative output, we correlated human judgments with each metric. None of the metrics demonstrated high alignment; however, the SapBERT score-a Unified Medical Language System (UMLS)- showed the best results. This underscores the importance of incorporating domain-specific knowledge into evaluation efforts. Our work reveals the deficiency in quality evaluations for generated text and introduces our comprehensive human evaluation framework as a baseline. Future efforts should prioritize integrating medical knowledge databases to enhance the alignment of automated metrics, particularly focusing on refining the SapBERT score for improved assessments.
Collapse
Affiliation(s)
- Emma Croxford
- Department of Medicine, School of Medicine and Public Health, University of Wisconsin Madison
| | - Yanjun Gao
- Department of Medicine, School of Medicine and Public Health, University of Wisconsin Madison
| | - Brian Patterson
- Department of Emergency Medicine, School of Medicine and Public Health, University of Wisconsin Madison
| | - Daniel To
- Department of Medicine, School of Medicine and Public Health, University of Wisconsin Madison
| | - Samuel Tesch
- Department of Medicine, School of Medicine and Public Health, University of Wisconsin Madison
| | | | - Anoop Mayampurath
- Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin Madison
| | - Matthew M Churpek
- Department of Medicine, School of Medicine and Public Health, University of Wisconsin Madison
| | - Majid Afshar
- Department of Medicine, School of Medicine and Public Health, University of Wisconsin Madison
| |
Collapse
|
13
|
Koga S. The double-edged nature of ChatGPT in self-diagnosis. Wien Klin Wochenschr 2024; 136:243-244. [PMID: 38504058 DOI: 10.1007/s00508-024-02343-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 02/27/2024] [Indexed: 03/21/2024]
Affiliation(s)
- Shunsuke Koga
- Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, 3400 Spruce Street, 19104, Philadelphia, PA, USA.
| |
Collapse
|
14
|
Parekh AS, McCahon JAS, Nghe A, Pedowitz DI, Daniel JN, Parekh SG. Foot and Ankle Patient Education Materials and Artificial Intelligence Chatbots: A Comparative Analysis. Foot Ankle Spec 2024:19386400241235834. [PMID: 38504411 DOI: 10.1177/19386400241235834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
BACKGROUND The purpose of this study was to perform a comparative analysis of foot and ankle patient education material generated by the AI chatbots, as they compare to the American Orthopaedic Foot and Ankle Society (AOFAS)-recommended patient education website, FootCareMD.org. METHODS ChatGPT, Google Bard, and Bing AI were used to generate patient educational materials on 10 of the most common foot and ankle conditions. The content from these AI language model platforms was analyzed and compared with that in FootCareMD.org for accuracy of included information. Accuracy was determined for each of the 10 conditions on a basis of included information regarding background, symptoms, causes, diagnosis, treatments, surgical options, recovery procedures, and risks or preventions. RESULTS When compared to the reference standard of the AOFAS website FootCareMD.org, the AI language model platforms consistently scored below 60% in accuracy rates in all categories of the articles analyzed. ChatGPT was found to contain an average of 46.2% of key content across all included conditions when compared to FootCareMD.org. Comparatively, Google Bard and Bing AI contained 36.5% and 28.0% of information included on FootCareMD.org, respectively (P < .005). CONCLUSION Patient education regarding common foot and ankle conditions generated by AI language models provides limited content accuracy across all 3 AI chatbot platforms. LEVEL OF EVIDENCE Level IV.
Collapse
Affiliation(s)
- Aarav S Parekh
- Rothman Orthopaedic Institute, Philadelphia, Pennsylvania
| | | | - Amy Nghe
- Rothman Orthopaedic Institute, Philadelphia, Pennsylvania
| | | | | | | |
Collapse
|
15
|
Li J, Dada A, Puladi B, Kleesiek J, Egger J. ChatGPT in healthcare: A taxonomy and systematic review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 245:108013. [PMID: 38262126 DOI: 10.1016/j.cmpb.2024.108013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 12/29/2023] [Accepted: 01/08/2024] [Indexed: 01/25/2024]
Abstract
The recent release of ChatGPT, a chat bot research project/product of natural language processing (NLP) by OpenAI, stirs up a sensation among both the general public and medical professionals, amassing a phenomenally large user base in a short time. This is a typical example of the 'productization' of cutting-edge technologies, which allows the general public without a technical background to gain firsthand experience in artificial intelligence (AI), similar to the AI hype created by AlphaGo (DeepMind Technologies, UK) and self-driving cars (Google, Tesla, etc.). However, it is crucial, especially for healthcare researchers, to remain prudent amidst the hype. This work provides a systematic review of existing publications on the use of ChatGPT in healthcare, elucidating the 'status quo' of ChatGPT in medical applications, for general readers, healthcare professionals as well as NLP scientists. The large biomedical literature database PubMed is used to retrieve published works on this topic using the keyword 'ChatGPT'. An inclusion criterion and a taxonomy are further proposed to filter the search results and categorize the selected publications, respectively. It is found through the review that the current release of ChatGPT has achieved only moderate or 'passing' performance in a variety of tests, and is unreliable for actual clinical deployment, since it is not intended for clinical applications by design. We conclude that specialized NLP models trained on (bio)medical datasets still represent the right direction to pursue for critical clinical applications.
Collapse
Affiliation(s)
- Jianning Li
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany
| | - Amin Dada
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany
| | - Behrus Puladi
- Institute of Medical Informatics, University Hospital RWTH Aachen, Pauwelsstraße 30, 52074 Aachen, Germany; Department of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Pauwelsstraße 30, 52074 Aachen, Germany
| | - Jens Kleesiek
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany; TU Dortmund University, Department of Physics, Otto-Hahn-Straße 4, 44227 Dortmund, Germany
| | - Jan Egger
- Institute for Artificial Intelligence in Medicine, University Hospital Essen (AöR), Girardetstraße 2, 45131 Essen, Germany; Center for Virtual and Extended Reality in Medicine (ZvRM), University Hospital Essen, University Medicine Essen, Hufelandstraße 55, 45147 Essen, Germany.
| |
Collapse
|
16
|
Nacher M, Françoise U, Adenis A. ChatGPT neglects a neglected disease. THE LANCET. INFECTIOUS DISEASES 2024; 24:e76. [PMID: 38211603 DOI: 10.1016/s1473-3099(23)00750-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 11/29/2023] [Accepted: 11/29/2023] [Indexed: 01/13/2024]
Affiliation(s)
- Mathieu Nacher
- CIC Inserm 1424, Amazonian Institute of Population Health, Centre Hospitalier de Cayenne, 97300 Cayenne, French Guiana.
| | - Ugo Françoise
- CIC Inserm 1424, Amazonian Institute of Population Health, Centre Hospitalier de Cayenne, 97300 Cayenne, French Guiana
| | - Antoine Adenis
- CIC Inserm 1424, Amazonian Institute of Population Health, Centre Hospitalier de Cayenne, 97300 Cayenne, French Guiana
| |
Collapse
|
17
|
Thirunavukarasu AJ. How Can the Clinical Aptitude of AI Assistants Be Assayed? J Med Internet Res 2023; 25:e51603. [PMID: 38051572 PMCID: PMC10731545 DOI: 10.2196/51603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 09/28/2023] [Accepted: 11/20/2023] [Indexed: 12/07/2023] Open
Abstract
Large language models (LLMs) are exhibiting remarkable performance in clinical contexts, with exemplar results ranging from expert-level attainment in medical examination questions to superior accuracy and relevance when responding to patient queries compared to real doctors replying to queries on social media. The deployment of LLMs in conventional health care settings is yet to be reported, and there remains an open question as to what evidence should be required before such deployment is warranted. Early validation studies use unvalidated surrogate variables to represent clinical aptitude, and it may be necessary to conduct prospective randomized controlled trials to justify the use of an LLM for clinical advice or assistance, as potential pitfalls and pain points cannot be exhaustively predicted. This viewpoint states that as LLMs continue to revolutionize the field, there is an opportunity to improve the rigor of artificial intelligence (AI) research to reward innovation, conferring real benefits to real patients.
Collapse
Affiliation(s)
- Arun James Thirunavukarasu
- Oxford University Clinical Academic Graduate School, University of Oxford, Oxford, United Kingdom
- School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
18
|
Sallam M, Barakat M, Sallam M. Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models. Cureus 2023; 15:e49373. [PMID: 38024074 PMCID: PMC10674084 DOI: 10.7759/cureus.49373] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2023] [Indexed: 12/01/2023] Open
Abstract
Background Artificial intelligence (AI)-based conversational models, such as Chat Generative Pre-trained Transformer (ChatGPT), Microsoft Bing, and Google Bard, have emerged as valuable sources of health information for lay individuals. However, the accuracy of the information provided by these AI models remains a significant concern. This pilot study aimed to test a new tool with key themes for inclusion as follows: Completeness of content, Lack of false information in the content, Evidence supporting the content, Appropriateness of the content, and Relevance, referred to as "CLEAR", designed to assess the quality of health information delivered by AI-based models. Methods Tool development involved a literature review on health information quality, followed by the initial establishment of the CLEAR tool, which comprised five items that aimed to assess the following: completeness, lack of false information, evidence support, appropriateness, and relevance. Each item was scored on a five-point Likert scale from excellent to poor. Content validity was checked by expert review. Pilot testing involved 32 healthcare professionals using the CLEAR tool to assess content on eight different health topics deliberately designed with varying qualities. The internal consistency was checked with Cronbach's alpha (α). Feedback from the pilot test resulted in language modifications to improve the clarity of the items. The final CLEAR tool was used to assess the quality of health information generated by four distinct AI models on five health topics. The AI models were ChatGPT 3.5, ChatGPT 4, Microsoft Bing, and Google Bard, and the content generated was scored by two independent raters with Cohen's kappa (κ) for inter-rater agreement. Results The final five CLEAR items were: (1) Is the content sufficient?; (2) Is the content accurate?; (3) Is the content evidence-based?; (4) Is the content clear, concise, and easy to understand?; and (5) Is the content free from irrelevant information? Pilot testing on the eight health topics revealed acceptable internal consistency with a Cronbach's α range of 0.669-0.981. The use of the final CLEAR tool yielded the following average scores: Microsoft Bing (mean=24.4±0.42), ChatGPT-4 (mean=23.6±0.96), Google Bard (mean=21.2±1.79), and ChatGPT-3.5 (mean=20.6±5.20). The inter-rater agreement revealed the following Cohen κ values: for ChatGPT-3.5 (κ=0.875, P<.001), ChatGPT-4 (κ=0.780, P<.001), Microsoft Bing (κ=0.348, P=.037), and Google Bard (κ=.749, P<.001). Conclusions The CLEAR tool is a brief yet helpful tool that can aid in standardizing testing of the quality of health information generated by AI-based models. Future studies are recommended to validate the utility of the CLEAR tool in the quality assessment of AI-generated health-related content using a larger sample across various complex health topics.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology, and Forensic Medicine, School of Medicine, University of Jordan, Amman, JOR
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, JOR
| | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, School of Pharmacy, Applied Science Private University, Amman, JOR
- Department of Research, Middle East University, Amman, JOR
| | - Mohammed Sallam
- Department of Pharmacy, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, ARE
| |
Collapse
|