1
|
Tam TYC, Sivarajkumar S, Kapoor S, Stolyar AV, Polanska K, McCarthy KR, Osterhoudt H, Wu X, Visweswaran S, Fu S, Mathur P, Cacciamani GE, Sun C, Peng Y, Wang Y. A framework for human evaluation of large language models in healthcare derived from literature review. NPJ Digit Med 2024; 7:258. [PMID: 39333376 PMCID: PMC11437138 DOI: 10.1038/s41746-024-01258-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Accepted: 09/11/2024] [Indexed: 09/29/2024] Open
Abstract
With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential to assuring safety and effectiveness. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare across various medical specialties and addresses factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Our literature review of 142 studies shows gaps in reliability, generalizability, and applicability of current human evaluation practices. To overcome such significant obstacles to healthcare LLM developments and deployments, we propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.
Collapse
Affiliation(s)
- Thomas Yu Chow Tam
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Sumit Kapoor
- Department of Critical Care Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Alisa V Stolyar
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Katelyn Polanska
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Karleigh R McCarthy
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hunter Osterhoudt
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xizhi Wu
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sunyang Fu
- Department of Clinical and Health Informatics, Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Piyush Mathur
- Department of Anesthesiology, Cleveland Clinic, Cleveland, OH, USA
- BrainX AI ReSearch, BrainX LLC, Cleveland, OH, USA
| | - Giovanni E Cacciamani
- Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Cong Sun
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, USA.
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
- Clinical and Translational Science Institute, University of Pittsburgh, Pittsburgh, PA, USA.
- Hillman Cancer Center, University of Pittsburgh Medical Center, Pittsburgh, PA, USA.
| |
Collapse
|
2
|
Sallam M, Al-Salahat K, Eid H, Egger J, Puladi B. Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions. ADVANCES IN MEDICAL EDUCATION AND PRACTICE 2024; 15:857-871. [PMID: 39319062 PMCID: PMC11421444 DOI: 10.2147/amep.s479801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 09/15/2024] [Indexed: 09/26/2024]
Abstract
Introduction Artificial intelligence (AI) chatbots excel in language understanding and generation. These models can transform healthcare education and practice. However, it is important to assess the performance of such AI models in various topics to highlight its strengths and possible limitations. This study aimed to evaluate the performance of ChatGPT (GPT-3.5 and GPT-4), Bing, and Bard compared to human students at a postgraduate master's level in Medical Laboratory Sciences. Methods The study design was based on the METRICS checklist for the design and reporting of AI-based studies in healthcare. The study utilized a dataset of 60 Clinical Chemistry multiple-choice questions (MCQs) initially conceived for assessing 20 MSc students. The revised Bloom's taxonomy was used as the framework for classifying the MCQs into four cognitive categories: Remember, Understand, Analyze, and Apply. A modified version of the CLEAR tool was used for the assessment of the quality of AI-generated content, with Cohen's κ for inter-rater agreement. Results Compared to the mean students' score which was 0.68±0.23, GPT-4 scored 0.90 ± 0.30, followed by Bing (0.77 ± 0.43), GPT-3.5 (0.73 ± 0.45), and Bard (0.67 ± 0.48). Statistically significant better performance was noted in lower cognitive domains (Remember and Understand) in GPT-3.5 (P=0.041), GPT-4 (P=0.003), and Bard (P=0.017) compared to the higher cognitive domains (Apply and Analyze). The CLEAR scores indicated that ChatGPT-4 performance was "Excellent" compared to the "Above average" performance of ChatGPT-3.5, Bing, and Bard. Discussion The findings indicated that ChatGPT-4 excelled in the Clinical Chemistry exam, while ChatGPT-3.5, Bing, and Bard were above average. Given that the MCQs were directed to postgraduate students with a high degree of specialization, the performance of these AI chatbots was remarkable. Due to the risk of academic dishonesty and possible dependence on these AI models, the appropriateness of MCQs as an assessment tool in higher education should be re-evaluated.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan
- Scientific Approaches to Fight Epidemics of Infectious Diseases (SAFE-ID) Research Group, The University of Jordan, Amman, Jordan
| | - Khaled Al-Salahat
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
- Scientific Approaches to Fight Epidemics of Infectious Diseases (SAFE-ID) Research Group, The University of Jordan, Amman, Jordan
| | - Huda Eid
- Scientific Approaches to Fight Epidemics of Infectious Diseases (SAFE-ID) Research Group, The University of Jordan, Amman, Jordan
| | - Jan Egger
- Institute for AI in Medicine (IKIM), University Medicine Essen (AöR), Essen, Germany
| | - Behrus Puladi
- Institute of Medical Informatics, University Hospital RWTH Aachen, Aachen, Germany
| |
Collapse
|
3
|
Sallam M, Al-Mahzoum K, Almutawaa RA, Alhashash JA, Dashti RA, AlSafy DR, Almutairi RA, Barakat M. The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses. BMC Res Notes 2024; 17:247. [PMID: 39228001 PMCID: PMC11373487 DOI: 10.1186/s13104-024-06920-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 08/28/2024] [Indexed: 09/05/2024] Open
Abstract
OBJECTIVE The integration of artificial intelligence (AI) in healthcare education is inevitable. Understanding the proficiency of generative AI in different languages to answer complex questions is crucial for educational purposes. The study objective was to compare the performance ChatGPT-4 and Gemini in answering Virology multiple-choice questions (MCQs) in English and Arabic, while assessing the quality of the generated content. Both AI models' responses to 40 Virology MCQs were assessed for correctness and quality based on the CLEAR tool designed for evaluation of AI-generated content. The MCQs were classified into lower and higher cognitive categories based on the revised Bloom's taxonomy. The study design considered the METRICS checklist for the design and reporting of generative AI-based studies in healthcare. RESULTS ChatGPT-4 and Gemini performed better in English compared to Arabic, with ChatGPT-4 consistently surpassing Gemini in correctness and CLEAR scores. ChatGPT-4 led Gemini with 80% vs. 62.5% correctness in English compared to 65% vs. 55% in Arabic. For both AI models, superior performance in lower cognitive domains was reported. Both ChatGPT-4 and Gemini exhibited potential in educational applications; nevertheless, their performance varied across languages highlighting the importance of continued development to ensure the effective AI integration in healthcare education globally.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan.
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Queen Rania Al-Abdullah Street-Aljubeiha, P.O. Box: 13046, Amman, 11942, Jordan.
| | | | | | | | | | | | | | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, 11931, Jordan
| |
Collapse
|
4
|
Lonsdale H, O'Reilly-Shah VN, Padiyath A, Simpao AF. Supercharge Your Academic Productivity with Generative Artificial Intelligence. J Med Syst 2024; 48:73. [PMID: 39115560 PMCID: PMC11457929 DOI: 10.1007/s10916-024-02093-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 07/23/2024] [Indexed: 10/09/2024]
Affiliation(s)
- Hannah Lonsdale
- Department of Anesthesiology, Vanderbilt University School of Medicine, Monroe Carell Jr. Children's Hospital at Vanderbilt, Nashville, TN, 37232, USA.
| | - Vikas N O'Reilly-Shah
- Department of Anesthesiology & Pain Medicine, University of Washington School of Medicine, Seattle, WA, USA
| | - Asif Padiyath
- Department of Anesthesiology and Critical Care, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Anesthesiology and Critical Care Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Allan F Simpao
- Department of Anesthesiology and Critical Care, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Anesthesiology and Critical Care Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| |
Collapse
|
5
|
Sallam M, Al-Mahzoum K, Alshuaib O, Alhajri H, Alotaibi F, Alkhurainej D, Al-Balwah MY, Barakat M, Egger J. Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic. BMC Infect Dis 2024; 24:799. [PMID: 39118057 PMCID: PMC11308449 DOI: 10.1186/s12879-024-09725-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 08/06/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy of information in multilingual contexts. This study aimed to compare AI model efficiency in English and Arabic for infectious disease queries. METHODS The study employed the METRICS checklist for the design and reporting of AI-based studies in healthcare. The AI models tested included ChatGPT-3.5, ChatGPT-4, Bing, and Bard. The queries comprised 15 questions on HIV/AIDS, tuberculosis, malaria, COVID-19, and influenza. The AI-generated content was assessed by two bilingual experts using the validated CLEAR tool. RESULTS In comparing AI models' performance in English and Arabic for infectious disease queries, variability was noted. English queries showed consistently superior performance, with Bard leading, followed by Bing, ChatGPT-4, and ChatGPT-3.5 (P = .012). The same trend was observed in Arabic, albeit without statistical significance (P = .082). Stratified analysis revealed higher scores for English in most CLEAR components, notably in completeness, accuracy, appropriateness, and relevance, especially with ChatGPT-3.5 and Bard. Across the five infectious disease topics, English outperformed Arabic, except for flu queries in Bing and Bard. The four AI models' performance in English was rated as "excellent", significantly outperforming their "above-average" Arabic counterparts (P = .002). CONCLUSIONS Disparity in AI model performance was noticed between English and Arabic in response to infectious disease queries. This language variation can negatively impact the quality of health content delivered by AI models among native speakers of Arabic. This issue is recommended to be addressed by AI developers, with the ultimate goal of enhancing health outcomes.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, 11942, Jordan.
- Department of Translational Medicine, Faculty of Medicine, Lund University, Malmö, 22184, Sweden.
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Queen Rania Al-Abdullah Street-Aljubeiha, P.O. Box: 13046, Amman, Jordan.
| | | | - Omaima Alshuaib
- School of Medicine, The University of Jordan, Amman, 11942, Jordan
| | - Hawajer Alhajri
- School of Medicine, The University of Jordan, Amman, 11942, Jordan
| | - Fatmah Alotaibi
- School of Medicine, The University of Jordan, Amman, 11942, Jordan
| | | | | | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, 11931, Jordan
- MEU Research Unit, Middle East University, Amman, 11831, Jordan
| | - Jan Egger
- Institute for AI in Medicine (IKIM), University Medicine Essen (AöR), Essen, Germany
| |
Collapse
|
6
|
Sallam M. Bibliometric top ten healthcare-related ChatGPT publications in the first ChatGPT anniversary. NARRA J 2024; 4:e917. [PMID: 39280327 PMCID: PMC11391998 DOI: 10.52225/narra.v4i2.917] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 07/29/2024] [Indexed: 09/18/2024]
Abstract
Since its public release on November 30, 2022, ChatGPT has shown promising potential in diverse healthcare applications despite ethical challenges, privacy issues, and possible biases. The aim of this study was to identify and assess the most influential publications in the field of ChatGPT utility in healthcare using bibliometric analysis. The study employed an advanced search on three databases, Scopus, Web of Science, and Google Scholar, to identify ChatGPT-related records in healthcare education, research, and practice between November 27 and 30, 2023. The ranking was based on the retrieved citation count in each database. The additional alternative metrics that were evaluated included (1) Semantic Scholar highly influential citations, (2) PlumX captures, (3) PlumX mentions, (4) PlumX social media and (5) Altmetric Attention Scores (AASs). A total of 22 unique records published in 17 different scientific journals from 14 different publishers were identified in the three databases. Only two publications were in the top 10 list across the three databases. Variable publication types were identified, with the most common being editorial/commentary publications (n=8/22, 36.4%). Nine of the 22 records had corresponding authors affiliated with institutions in the United States (40.9%). The range of citation count varied per database, with the highest range identified in Google Scholar (1019-121), followed by Scopus (242-88), and Web of Science (171-23). Google Scholar citations were correlated significantly with the following metrics: Semantic Scholar highly influential citations (Spearman's correlation coefficient ρ=0.840, p<0.001), PlumX captures (ρ=0.831, p<0.001), PlumX mentions (ρ=0.609, p=0.004), and AASs (ρ=0.542, p=0.009). In conclusion, despite several acknowledged limitations, this study showed the evolving landscape of ChatGPT utility in healthcare. There is an urgent need for collaborative initiatives by all stakeholders involved to establish guidelines for ethical, transparent, and responsible use of ChatGPT in healthcare. The study revealed the correlation between citations and alternative metrics, highlighting its usefulness as a supplement to gauge the impact of publications, even in a rapidly growing research field.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan
- Department of Translational Medicine, Faculty of Medicine, Lund University, Malmö, Sweden
| |
Collapse
|
7
|
Yilmaz Muluk S, Olcucu N. The Role of Artificial Intelligence in the Primary Prevention of Common Musculoskeletal Diseases. Cureus 2024; 16:e65372. [PMID: 39184635 PMCID: PMC11344583 DOI: 10.7759/cureus.65372] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/25/2024] [Indexed: 08/27/2024] Open
Abstract
BACKGROUND Musculoskeletal disorders (MSDs) are a leading cause of disability worldwide, with a growing burden across all demographics. With advancements in technology, conversational artificial intelligence (AI) platforms such as ChatGPT (OpenAI, San Francisco, CA) have become instrumental in disseminating health information. This study evaluated the effectiveness of ChatGPT versions 3.5 and 4 in delivering primary prevention information for common MSDs, emphasizing that the study is focused on prevention and not on diagnosis. METHODS This mixed-methods study employed the CLEAR tool to assess the quality of responses from ChatGPT versions in terms of completeness, lack of false information, evidence support, appropriateness, and relevance. Responses were evaluated independently by two expert raters in a blinded manner. Statistical analyses included Wilcoxon signed-rank tests and paired samples t-tests to compare the performance across versions. RESULTS ChatGPT-3.5 and ChatGPT-4 effectively provided primary prevention information, with overall performance ranging from satisfactory to excellent. Responses for low back pain, fractures, knee osteoarthritis, neck pain, and gout received excellent scores from both versions. Additionally, ChatGPT-4 was better than ChatGPT-3.5 in terms of completeness (p = 0.015), appropriateness (p = 0.007), and relevance (p = 0.036), and ChatGPT-4 performed better across most medical conditions (p = 0.010). CONCLUSIONS ChatGPT versions 3.5 and 4 are effective tools for disseminating primary prevention information for common MSDs, with ChatGPT-4 showing superior performance. This study underscores the potential of AI in enhancing public health strategies through reliable and accessible health communication. Advanced models such as ChatGPT-4 can effectively contribute to the primary prevention of MSDs by delivering high-quality health information, highlighting the role of AIs in addressing the global burden of chronic diseases. It is important to note that these AI tools are intended for preventive education purposes only and not for diagnostic use. Continuous improvements are necessary to fully harness the potential of AI in preventive medicine. Future studies should explore other AI platforms, languages, and secondary and tertiary prevention measures to maximize the utility of AIs in global health contexts.
Collapse
Affiliation(s)
| | - Nazli Olcucu
- Physical Medicine and Rehabilitation, Antalya Atatürk State Hospital, Antalya, TUR
| |
Collapse
|
8
|
Almutairi R, Alsarraf A, Alkandari D, Ashkanani H, Albazali A. Dissecting Through the Literature: A Review of the Critical Appraisal Process. Cureus 2024; 16:e59658. [PMID: 38836144 PMCID: PMC11148477 DOI: 10.7759/cureus.59658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/04/2024] [Indexed: 06/06/2024] Open
Abstract
Critical appraisal is a crucial step in evidence-based practice, enabling researchers to evaluate the credibility and applicability of research findings. Healthcare professionals are encouraged to cultivate critical appraisal skills to assess the trustworthiness and value of available evidence. This process involves scrutinizing key components of a research publication, understanding the strengths and weaknesses of the study, and assessing its relevance to a specific context. It is essential for researchers to become familiar with the core elements of a research article and utilize key questions and guidelines to rigorously assess a study. This paper aims to provide an overview of the critical appraisal process. By understanding the main points of critical appraisal, researchers can assess the quality, relevance, and reliability of articles, thereby enhancing the validity of their findings and decision-making processes.
Collapse
|
9
|
Rentiya ZS, Mandal S, Inban P, Vempalli H, Dabbara R, Ali S, Kaur K, Adegbite A, Intsiful TA, Jayan M, Odoma VA, Khan A. Revolutionizing Breast Cancer Detection With Artificial Intelligence (AI) in Radiology and Radiation Oncology: A Systematic Review. Cureus 2024; 16:e57619. [PMID: 38711711 PMCID: PMC11073588 DOI: 10.7759/cureus.57619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/04/2024] [Indexed: 05/08/2024] Open
Abstract
The number one cause of cancer in women worldwide is breast cancer. Over the last three decades, the use of traditional screen-film mammography has increased, but in recent years, digital mammography and 3D tomosynthesis have become standard procedures for breast cancer screening. With the advancement of technology, the interpretation of images using automated algorithms has become a subject of interest. Initially, computer-aided detection (CAD) was introduced; however, it did not show any long-term benefit in clinical practice. With recent advances in artificial intelligence (AI) methods, these technologies are showing promising potential for more accurate and efficient automated breast cancer detection and treatment. While AI promises widespread integration in breast cancer detection and treatment, challenges such as data quality, regulatory, ethical implications, and algorithm validation are crucial. Addressing these is essential for fully realizing AI's potential in enhancing early diagnosis and improving patient outcomes in breast cancer management. In this review article, we aim to provide an overview of the latest developments and applications of AI in breast cancer screening and treatment. While the existing literature primarily consists of retrospective studies, ongoing and future prospective research is poised to offer deeper insights. Artificial intelligence is on the verge of widespread integration into breast cancer detection and treatment, holding the potential to enhance early diagnosis and improve patient outcomes.
Collapse
Affiliation(s)
- Zubir S Rentiya
- Radiation Oncology & Radiology, University of Virginia School of Medicine, Charlottesville, USA
| | - Shobha Mandal
- Neurology, Regional Neurological Associates, New York, USA
- Internal Medicine, Salem Internal Medicine, Primary Care (PC), Pennsville, USA
| | | | | | - Rishika Dabbara
- Internal Medicine, Kamineni Institute of Medical Sciences, Hyderabad, IND
| | - Sofia Ali
- Medicine, Peninsula Medical School, Plymouth, GBR
| | - Kirpa Kaur
- Medicine, Howard Community College, Ellicott City, USA
| | | | - Tarsha A Intsiful
- Radiology, College of Medicine, University of Ghana Medical Center, Accra, GHA
| | - Malavika Jayan
- Internal Medicine, Bangalore Medical College and Research Institute, Bangalore, IND
| | - Victor A Odoma
- Research, California Institute of Behavioral Neurosciences & Psychology, Fairfield, USA
- Cardiovascular Medicine/Oncology (Acuity Adaptable Unit), Indiana University Health, Bloomington, USA
| | - Aadil Khan
- Trauma Surgery, Order of St. Francis (OSF) St Francis Medical Centre, University of Illinois Chicago, Peoria, USA
- Cardiology, University of Illinois at Chicago, Chicago, USA
- Internal Medicine, Lala Lajpat Rai (LLR) Hospital, Kanpur, IND
| |
Collapse
|
10
|
Joseph J, Jose AS, Ettaniyil GG, S J, Jose J. Mapping the Landscape of Electronic Health Records and Health Information Exchange Through Bibliometric Analysis and Visualization. Cureus 2024; 16:e59128. [PMID: 38803769 PMCID: PMC11129286 DOI: 10.7759/cureus.59128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2024] [Indexed: 05/29/2024] Open
Abstract
The adoption of Electronic Health Records (EHRs) and the establishment of Health Information Exchange (HIE) systems have significantly transformed healthcare delivery and management. This study presents a comprehensive bibliometric analysis and visualization of the landscape surrounding EHRs and HIE to provide insights into the current state and emerging trends in this field. Leveraging advanced bibliometric methodologies, including co-citation analysis, keyword co-occurrence analysis, and network visualization techniques, we systematically map the scholarly literature spanning several decades. Our analysis reveals key thematic clusters, influential publications, prolific authors, and collaborative networks within the domain of EHRs and HIE. Furthermore, we identify significant research gaps and areas for future exploration, including interoperability challenges, privacy concerns, and the integration of emerging technologies such as artificial intelligence and blockchain. The findings of this study offer valuable insights for researchers, policymakers, and healthcare practitioners seeking to navigate and contribute to the ongoing evolution of EHRs and HIE systems, ultimately enhancing the quality, efficiency, and accessibility of healthcare services.
Collapse
Affiliation(s)
- Jeena Joseph
- Department of Computer Applications, Marian College Kuttikkanam (Autonomous), Kuttikkanam, IND
| | - Anat Suman Jose
- Department of Library, St. Peter's College Kolenchery, Kolenchery, IND
| | - Gilu G Ettaniyil
- Department of Library, St. Thomas College of Teacher Education, Pala, IND
| | - Jasimudeen S
- Department of Library, St. Stephen's College Uzhavoor, Uzhavoor, IND
| | - Jobin Jose
- Department of Library, Marian College Kuttikkanam (Autonomous), Kuttikkanam, IND
| |
Collapse
|
11
|
Elhaddad M, Hamam S. AI-Driven Clinical Decision Support Systems: An Ongoing Pursuit of Potential. Cureus 2024; 16:e57728. [PMID: 38711724 PMCID: PMC11073764 DOI: 10.7759/cureus.57728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/05/2024] [Indexed: 05/08/2024] Open
Abstract
Clinical Decision Support Systems (CDSS) are essential tools in contemporary healthcare, enhancing clinicians' decisions and patient outcomes. The integration of artificial intelligence (AI) is now revolutionizing CDSS even further. This review delves into AI technologies transforming CDSS, their applications in healthcare decision-making, associated challenges, and the potential trajectory toward fully realizing AI-CDSS's potential. The review begins by laying the groundwork with a definition of CDSS and its function within the healthcare field. It then highlights the increasingly significant role that AI is playing in enhancing CDSS effectiveness and efficiency, underlining its evolving prominence in shaping healthcare practices. It examines the integration of AI technologies into CDSS, including machine learning algorithms like neural networks and decision trees, natural language processing, and deep learning. It also addresses the challenges associated with AI integration, such as interpretability and bias. We then shift to AI applications within CDSS, with real-life examples of AI-driven diagnostics, personalized treatment recommendations, risk prediction, early intervention, and AI-assisted clinical documentation. The review emphasizes user-centered design in AI-CDSS integration, addressing usability, trust, workflow, and ethical and legal considerations. It acknowledges prevailing obstacles and suggests strategies for successful AI-CDSS adoption, highlighting the need for workflow alignment and interdisciplinary collaboration. The review concludes by summarizing key findings, underscoring AI's transformative potential in CDSS, and advocating for continued research and innovation. It emphasizes the need for collaborative efforts to realize a future where AI-powered CDSS optimizes healthcare delivery and improves patient outcomes.
Collapse
Affiliation(s)
- Malek Elhaddad
- Medicine, The Hospital for Sick Children, Toronto, CAN
- Medicine, Upper Canada College, Toronto, CAN
| | - Sara Hamam
- Ophthalmology, Queen Elizabeth University Hospital, Glasgow, GBR
| |
Collapse
|
12
|
Skrzypczak T, Skrzypczak A, Szepietowski JC. Readability of Patient Electronic Materials for Atopic Dermatitis in 23 Languages: Analysis and Implications for Dermatologists. Dermatol Ther (Heidelb) 2024; 14:671-684. [PMID: 38402338 PMCID: PMC10965833 DOI: 10.1007/s13555-024-01115-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 02/08/2024] [Indexed: 02/26/2024] Open
Abstract
INTRODUCTION Patients search on the Internet for information about various medical procedures and conditions. The main aim of this study was to evaluate the readability of online health information related to atopic dermatitis (AD). Online resources are becoming a standard in facilitating shared decision-making processes. With a pipeline of new therapeutic options like immunomodulators, understanding of the complexity of AD by the patients is crucial. METHODS The term "atopic dermatitis" translated into 23 official European Union languages was searched using the Google search engine. The first 50 records in each language were evaluated for suitability. Included materials were barrier-free, focused on patient education, and were not categorized as advertisements. Article sources were classified into four categories: non-profit, online shops, pharmaceutical companies, and dermatology clinic. Readability was assessed with Lix score. RESULTS A total of 615 articles in Swedish, Spanish, Slovenian, Slovak, Romanian, Portuguese, Polish, Lithuanian, Latvian, Irish, Italian, Hungarian, Greek, German, French, Finnish, Estonian, English, Dutch, Danish, Czech, Croatian, and Bulgarian were evaluated. The overall mean Lix score was 56 ± 8, which classified articles as very hard to comprehend. Significant differences in mean Lix scores were observed across all included languages (all P < 0.001). Articles released by non-profit organizations and pharmaceutical companies had the highest readability (P < 0.001). Low readability level was correlated with high article prevalence (R2 = 0.189, P = 0.031). CONCLUSIONS Although there was an abundance of online articles related to AD, the readability of the available information was low. As online health information has become essential in making shared decisions between patients and physicians, an improvement in AD-related materials is needed.
Collapse
Affiliation(s)
- Tomasz Skrzypczak
- University Hospital in Wroclaw, Borowska 213, 50-556, Wroclaw, Poland
| | - Anna Skrzypczak
- Faculty of Dentistry, Wroclaw Medical University, Krakowska 26, 50-425, Wroclaw, Poland
| | - Jacek C Szepietowski
- Chair of the Department of Dermatology, Venerology and Allergology, Wroclaw Medical University, Chalubinskiego 1, 50-368, Wroclaw, Poland.
| |
Collapse
|
13
|
Reddy S, Shaheed A, Seo Y, Patel R. Development of an Artificial Intelligence Model for the Classification of Gastric Carcinoma Stages Using Pathology Slides. Cureus 2024; 16:e56740. [PMID: 38650818 PMCID: PMC11033212 DOI: 10.7759/cureus.56740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/22/2024] [Indexed: 04/25/2024] Open
Abstract
This study showcases a novel AI-driven approach to accurately differentiate between stage one and stage two gastric carcinoma based on pathology slide analysis. Gastric carcinoma, a significant contributor to cancer-related mortality globally, necessitates precise staging for optimal treatment planning and patient management. Leveraging a comprehensive dataset of 3540 high-resolution pathology images sourced from Kaggle.com, comprising an equal distribution of stage one and stage two tumors, the developed AI model demonstrates remarkable performance in tumor staging. Through the application of state-of-the-art deep learning techniques on Google's Collaboration platform, the model achieves outstanding accuracy and precision rates of 100%, accompanied by notable sensitivity (97.09%), specificity (100%), and F1-score (98.31%). Additionally, the model exhibits an impressive area under the receiver operating characteristic curve (AUC) of 0.999, indicating superior discriminatory power and robustness. By providing clinicians with an efficient and reliable tool for gastric carcinoma staging, this AI-driven approach has the potential to significantly enhance diagnostic accuracy, inform treatment decisions, and ultimately improve patient outcomes in the management of gastric carcinoma. This research contributes to the ongoing advancement of cancer diagnosis and underscores the transformative potential of artificial intelligence in clinical practice.
Collapse
Affiliation(s)
- Shreya Reddy
- Biomedical Sciences, Creighton University, Omaha, USA
| | - Avneet Shaheed
- Pathology, University of Illinois at Chicago, Chicago, USA
| | - Yui Seo
- Medicine, California Northstate University College of Medicine, Elk Grove, USA
| | - Rakesh Patel
- Internal Medicine, East Tennessee State University Quillen College of Medicine, Johnson City, USA
| |
Collapse
|
14
|
Pradhan M, Waghmare KT, Alghabshi R, Almahdouri F, Al Sawafi KM, M I, Alhadhramy AM, AlYaqoubi ER. Exploring the Economic Aspects of Hospitals: A Comprehensive Examination of Relevant Factors. Cureus 2024; 16:e54867. [PMID: 38533171 PMCID: PMC10964728 DOI: 10.7759/cureus.54867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/24/2024] [Indexed: 03/28/2024] Open
Abstract
Financial limitations in the hospital industry have the potential to exacerbate healthcare disparities, impede investments in cutting-edge medical treatments, as well as impair patient outcomes. The interdependent connection between a hospital economy and the general well-being of the community highlights the necessity of careful financial oversight and inventive healthcare policies. Effective collaboration among policymakers, healthcare administrators, and stakeholders is imperative in the development of sustainable economic models that give equal weight to fiscal prudence and optimal patient outcomes. This article aims to underscore the pivotal importance of strategic fund allocation guided by hospital administrators, accentuating several key initiatives capable of revolutionizing healthcare delivery and elevating the institution's stature within the medical community. The other important aspects discussed here are fund allocation in hospitals, the boom of online consultations, and emphasis on the use of sustainable and cost-effective modalities of energy.
Collapse
Affiliation(s)
- Madhur Pradhan
- Obstetrics and Gynaecology, Khoula Hospital, Muscat, OMN
| | | | | | | | | | - Iman M
- Obstetrics and Gynaecology, Khoula Hospital, Muscat, OMN
| | | | | |
Collapse
|
15
|
Sallam M, Al-Salahat K, Al-Ajlouni E. ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios. Cureus 2023; 15:e50629. [PMID: 38107211 PMCID: PMC10725273 DOI: 10.7759/cureus.50629] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/16/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI)-based tools can reshape healthcare practice. This includes ChatGPT which is considered among the most popular AI-based conversational models. Nevertheless, the performance of different versions of ChatGPT needs further evaluation in different settings to assess its reliability and credibility in various healthcare-related tasks. Therefore, the current study aimed to assess the performance of the freely available ChatGPT-3.5 and the paid version ChatGPT-4 in 10 different diagnostic clinical microbiology case scenarios. METHODS The current study followed the METRICS (Model, Evaluation, Timing/Transparency, Range/Randomization, Individual factors, Count, Specificity of the prompts/language) checklist for standardization of the design and reporting of AI-based studies in healthcare. The models tested on December 3, 2023 included ChatGPT-3.5 and ChatGPT-4 and the evaluation of the ChatGPT-generated content was based on the CLEAR tool (Completeness, Lack of false information, Evidence support, Appropriateness, and Relevance) assessed on a 5-point Likert scale with a range of the CLEAR scores of 1-5. ChatGPT output was evaluated by two raters independently and the inter-rater agreement was based on the Cohen's κ statistic. Ten diagnostic clinical microbiology laboratory case scenarios were created in the English language by three microbiologists at diverse levels of expertise following an internal discussion of common cases observed in Jordan. The range of topics included bacteriology, mycology, parasitology, and virology cases. Specific prompts were tailored based on the CLEAR tool and a new session was selected following prompting each case scenario. RESULTS The Cohen's κ values for the five CLEAR items were 0.351-0.737 for ChatGPT-3.5 and 0.294-0.701 for ChatGPT-4 indicating fair to good agreement and suitability for analysis. Based on the average CLEAR scores, ChatGPT-4 outperformed ChatGPT-3.5 (mean: 2.64±1.06 vs. 3.21±1.05, P=.012, t-test). The performance of each model varied based on the CLEAR items, with the lowest performance for the "Relevance" item (2.15±0.71 for ChatGPT-3.5 and 2.65±1.16 for ChatGPT-4). A statistically significant difference upon assessing the performance per each CLEAR item was only seen in ChatGPT-4 with the best performance in "Completeness", "Lack of false information", and "Evidence support" (P=0.043). The lowest level of performance for both models was observed with antimicrobial susceptibility testing (AST) queries while the highest level of performance was seen in bacterial and mycologic identification. CONCLUSIONS Assessment of ChatGPT performance across different diagnostic clinical microbiology case scenarios showed that ChatGPT-4 outperformed ChatGPT-3.5. The performance of ChatGPT demonstrated noticeable variability depending on the specific topic evaluated. A primary shortcoming of both ChatGPT models was the tendency to generate irrelevant content lacking the needed focus. Although the overall ChatGPT performance in these diagnostic microbiology case scenarios might be described as "above average" at best, there remains a significant potential for improvement, considering the identified limitations and unsatisfactory results in a few cases.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, The University of Jordan, School of Medicine, Amman, JOR
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, JOR
| | - Khaled Al-Salahat
- Department of Pathology, Microbiology and Forensic Medicine, The University of Jordan, School of Medicine, Amman, JOR
| | - Eyad Al-Ajlouni
- Department of Pathology, Microbiology and Forensic Medicine, The University of Jordan, School of Medicine, Amman, JOR
| |
Collapse
|