1
|
Yang HS, Li J, Yi X, Wang F. Performance evaluation of large language models with chain-of-thought reasoning ability in clinical laboratory case interpretation. Clin Chem Lab Med 2025:cclm-2025-0055. [PMID: 40023838 DOI: 10.1515/cclm-2025-0055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2025] [Accepted: 02/07/2025] [Indexed: 03/04/2025]
Affiliation(s)
- He S Yang
- Department of Pathology and Laboratory Medicine, 12295 Weill Cornell Medicine , New York, NY, USA
| | - Jieli Li
- Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH, USA
| | - Xin Yi
- Department of Pathology and Laboratory Medicine, 12295 Weill Cornell Medicine , New York, NY, USA
- Department of Pathology and Genomic Medicine, Houston Methodist Hospital, Houston, TX, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
2
|
Choo S, Yoo S, Endo K, Truong B, Son MH. Advancing Clinical Chatbot Validation Using AI-Powered Evaluation With a New 3-Bot Evaluation System: Instrument Validation Study. JMIR Nurs 2025; 8:e63058. [PMID: 40014000 PMCID: PMC11884306 DOI: 10.2196/63058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 12/21/2024] [Accepted: 01/02/2025] [Indexed: 02/28/2025] Open
Abstract
Background The health care sector faces a projected shortfall of 10 million workers by 2030. Artificial intelligence (AI) automation in areas such as patient education and initial therapy screening presents a strategic response to mitigate this shortage and reallocate medical staff to higher-priority tasks. However, current methods of evaluating early-stage health care AI chatbots are highly limited due to safety concerns and the amount of time and effort that goes into evaluating them. Objective This study introduces a novel 3-bot method for efficiently testing and validating early-stage AI health care provider chatbots. To extensively test AI provider chatbots without involving real patients or researchers, various AI patient bots and an evaluator bot were developed. Methods Provider bots interacted with AI patient bots embodying frustrated, anxious, or depressed personas. An evaluator bot reviewed interaction transcripts based on specific criteria. Human experts then reviewed each interaction transcript, and the evaluator bot's results were compared to human evaluation results to ensure accuracy. Results The patient-education bot's evaluations by the AI evaluator and the human evaluator were nearly identical, with minimal variance, limiting the opportunity for further analysis. The screening bot's evaluations also yielded similar results between the AI evaluator and human evaluator. Statistical analysis confirmed the reliability and accuracy of the AI evaluations. Conclusions The innovative evaluation method ensures a safe, adaptable, and effective means to test and refine early versions of health care provider chatbots without risking patient safety or investing excessive researcher time and effort. Our patient-education evaluator bots could have benefitted from larger evaluation criteria, as we had extremely similar results from the AI and human evaluators, which could have arisen because of the small number of evaluation criteria. We were limited in the amount of prompting we could input into each bot due to the practical consideration that response time increases with larger and larger prompts. In the future, using techniques such as retrieval augmented generation will allow the system to receive more information and become more specific and accurate in evaluating the chatbots. This evaluation method will allow for rapid testing and validation of health care chatbots to automate basic medical tasks, freeing providers to address more complex tasks.
Collapse
Affiliation(s)
- Seungheon Choo
- Research Institute for Future Medicine, Samsung Medical Center, Seoul, Republic of Korea
| | - Suyoung Yoo
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea
| | - Kumiko Endo
- Med2Lab Inc, San Francisco, CA, United States
| | - Bao Truong
- Med2Lab Inc, San Francisco, CA, United States
| | - Meong Hi Son
- Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, Republic of Korea
- Department of Emergency Medicine, Samsung Medical Center, Seoul, Republic of Korea
| |
Collapse
|
3
|
You J, Seok HS, Kim S, Shin H. Advancing Laboratory Medicine Practice With Machine Learning: Swift yet Exact. Ann Lab Med 2025; 45:22-35. [PMID: 39587856 PMCID: PMC11609717 DOI: 10.3343/alm.2024.0354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 09/01/2024] [Accepted: 10/25/2024] [Indexed: 11/27/2024] Open
Abstract
Machine learning (ML) is currently being widely studied and applied in data analysis and prediction in various fields, including laboratory medicine. To comprehensively evaluate the application of ML in laboratory medicine, we reviewed the literature on ML applications in laboratory medicine published between February 2014 and March 2024. A PubMed search using a search string yielded 779 articles on the topic, among which 144 articles were selected for this review. These articles were analyzed to extract and categorize related fields within laboratory medicine, research objectives, specimen types, data types, ML models, evaluation metrics, and sample sizes. Sankey diagrams and pie charts were used to illustrate the relationships between categories and the proportions within each category. We found that most studies involving the application of ML in laboratory medicine were designed to improve efficiency through automation or expand the roles of clinical laboratories. The most common ML models used are convolutional neural networks, multilayer perceptrons, and tree-based models, which are primarily selected based on the type of input data. Our findings suggest that, as the technology evolves, ML will rise in prominence in laboratory medicine as a tool for expanding research activities. Nonetheless, expertise in ML applications should be improved to effectively utilize this technology.
Collapse
Affiliation(s)
- Jiwon You
- Department of Digital Medicine, Brain Korea 21 Project, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Hyeon Seok Seok
- Department of Biomedical Engineering, Graduate School, Chonnam National University, Yeosu, Korea
| | - Sollip Kim
- Department of Laboratory Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Hangsik Shin
- Department of Digital Medicine, Brain Korea 21 Project, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| |
Collapse
|
4
|
Zhu Q, Cheong-Iao Pang P, Chen C, Zheng Q, Zhang C, Li J, Guo J, Mao C, He Y. Automatic kidney stone identification: an adaptive feature-weighted LSTM model based on urine and blood routine analysis. Urolithiasis 2024; 52:145. [PMID: 39402276 DOI: 10.1007/s00240-024-01644-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Accepted: 09/30/2024] [Indexed: 12/17/2024]
Abstract
Kidney stones are the most common urinary system diseases, and early identification is of great significance. The purpose of this study was to use routine urine and blood detection indices to build a deep learning (DL) model to identify the presence of kidney stones in the early stage. A retrospective analysis was conducted on patients with kidney stones who were treated at West China Hospital of Sichuan University from January 2020 to June 2023. A total of 1130 individuals presenting with kidney stones and 1230 healthy subjects were enrolled. The first blood and urine laboratory data of participants at our hospital were collected, and the data were divided into a training dataset (80%) and a verification dataset (20%). Additionally, a long short-term memory (LSTM)-based adaptive feature weighting model was trained for the early identification of kidney stones, and the results were compared with those of other models. The performance of the model was evaluated by the area under the subject working characteristic curve (AUC). The important predictive factors are determined by ranking the characteristic importance of the predictive factors. A total of 17 variables were screened; among the top 4 characteristics according to the weight coefficient in this model, urine WBC, urine occult blood, qualitative urinary protein, and microcyte percentage had high predictive value for kidney stones in patients. The accuracy of the kidney stone (KS-LSTM) learning model was 89.5%, and the AUC was 0.95. Compared with other models, it has better performance. The results show that the KS-LSTM model based on routine urine and blood tests can accurately identify the presence of kidney stones. And provide valuable assistance for clinicians to identify kidney stones in the early stage.
Collapse
Affiliation(s)
- Quanjing Zhu
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Guoxue Lane, Wuhou District, Chengdu, 610041, China
| | | | - Canhui Chen
- Beijing Four-Faith Digital Technology, Fengxiu Middle Road, Haidian District, Beijing, 100094, China
| | - Qingyuan Zheng
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Guoxue Lane, Wuhou District, Chengdu, 610041, China
| | - Chongwei Zhang
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Guoxue Lane, Wuhou District, Chengdu, 610041, China
| | - Jiaxuan Li
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
| | - Jielong Guo
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China
| | - Chao Mao
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| | - Yong He
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Guoxue Lane, Wuhou District, Chengdu, 610041, China.
| |
Collapse
|
5
|
Alyasiri OM, Salman AM, Akhtom D, Salisu S. ChatGPT revisited: Using ChatGPT-4 for finding references and editing language in medical scientific articles. JOURNAL OF STOMATOLOGY, ORAL AND MAXILLOFACIAL SURGERY 2024; 125:101842. [PMID: 38521243 DOI: 10.1016/j.jormas.2024.101842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 03/06/2024] [Accepted: 03/20/2024] [Indexed: 03/25/2024]
Abstract
The attainment of academic superiority relies heavily upon the accessibility of scholarly resources and the expression of research findings through faultless language usage. Although modern tools, such as the Publish or Perish software program, are proficient in sourcing academic papers based on specific keywords, they often fall short of extracting comprehensive content, including crucial references. The challenge of linguistic precision remains a prominent issue, particularly for research papers composed by non-native English speakers who may encounter word usage errors. This manuscript serves a twofold purpose: firstly, it reassesses the effectiveness of ChatGPT-4 in the context of retrieving pertinent references tailored to specific research topics. Secondly, it introduces a suite of language editing services that are skilled in rectifying word usage errors, ensuring the refined presentation of research outcomes. The article also provides practical guidelines for formulating precise queries to mitigate the risks of erroneous language usage and the inclusion of spurious references. In the ever-evolving realm of academic discourse, leveraging the potential of advanced AI, such as ChatGPT-4, can significantly enhance the quality and impact of scientific publications.
Collapse
Affiliation(s)
- Osamah Mohammed Alyasiri
- Karbala Technical Institute, Al-Furat Al-Awsat Technical University, Karbala 56001, Iraq; School of Computer Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia.
| | - Amer M Salman
- School of Mathematical Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia
| | - Dua'a Akhtom
- School of Computer Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia
| | - Sani Salisu
- School of Computer Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia; Department of Information Technology, Federal University Dutse, Dutse 720101, Nigeria
| |
Collapse
|
6
|
Laymouna M, Ma Y, Lessard D, Engler K, Therrien R, Schuster T, Vicente S, Achiche S, El Haj MN, Lemire B, Kawaiah A, Lebouché B. Needs-Assessment for an Artificial Intelligence-Based Chatbot for Pharmacists in HIV Care: Results from a Knowledge-Attitudes-Practices Survey. Healthcare (Basel) 2024; 12:1661. [PMID: 39201222 PMCID: PMC11353819 DOI: 10.3390/healthcare12161661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Revised: 08/14/2024] [Accepted: 08/15/2024] [Indexed: 09/02/2024] Open
Abstract
BACKGROUND Pharmacists need up-to-date knowledge and decision-making support in HIV care. We aim to develop MARVIN-Pharma, an adapted artificial intelligence-based chatbot initially for people with HIV, to assist pharmacists in considering evidence-based needs. METHODS From December 2022 to December 2023, an online needs-assessment survey evaluated Québec pharmacists' knowledge, attitudes, involvement, and barriers relative to HIV care, alongside perceptions relevant to the usability of MARVIN-Pharma. Recruitment involved convenience and snowball sampling, targeting National HIV and Hepatitis Mentoring Program affiliates. RESULTS Forty-one pharmacists (28 community, 13 hospital-based) across 15 Québec municipalities participated. Participants perceived their HIV knowledge as moderate (M = 3.74/6). They held largely favorable attitudes towards providing HIV care (M = 4.02/6). They reported a "little" involvement in the delivery of HIV care services (M = 2.08/5), most often ART adherence counseling, refilling, and monitoring. The most common barriers reported to HIV care delivery were a lack of time, staff resources, clinical tools, and HIV information/training, with pharmacists at least somewhat agreeing that they experienced each (M ≥ 4.00/6). On average, MARVIN-Pharma's acceptability and compatibility were in the 'undecided' range (M = 4.34, M = 4.13/7, respectively), while pharmacists agreed to their self-efficacy to use online health services (M = 5.6/7). CONCLUSION MARVIN-Pharma might help address pharmacists' knowledge gaps and barriers to HIV treatment and care, but pharmacist engagement in the chatbot's development seems vital for its future uptake and usability.
Collapse
Affiliation(s)
- Moustafa Laymouna
- Department of Family Medicine, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC H3S 1Z1, Canada; (M.L.)
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, QC H4A 3S5, Canada
| | - Yuanchao Ma
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Chronic Viral Illness Service, Division of Infectious Diseases, Department of Medicine, McGill University Health Centre, Montreal, QC H4A 3J1, Canada
- Department of Biomedical Engineering, Polytechnique Montréal, Montreal, QC H3T 1J4, Canada
| | - David Lessard
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Chronic Viral Illness Service, Division of Infectious Diseases, Department of Medicine, McGill University Health Centre, Montreal, QC H4A 3J1, Canada
| | - Kim Engler
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, QC H4A 3S5, Canada
| | - Rachel Therrien
- Department of Pharmacy and Chronic Viral Illness Service, Research Centre of the University of Montreal Hospital Centre, Montreal, QC H2X 0A9, Canada
| | - Tibor Schuster
- Department of Family Medicine, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC H3S 1Z1, Canada; (M.L.)
| | - Serge Vicente
- Department of Family Medicine, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC H3S 1Z1, Canada; (M.L.)
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Department of Mathematics and Statistics, University of Montreal, Montreal, QC H3T 1J4, Canada
| | - Sofiane Achiche
- Department of Biomedical Engineering, Polytechnique Montréal, Montreal, QC H3T 1J4, Canada
| | - Maria Nait El Haj
- Faculty of Pharmacy, University of Montreal, Montreal, QC H3C 3J7, Canada
| | - Benoît Lemire
- Chronic Viral Illness Service, Division of Infectious Diseases, Department of Medicine, McGill University Health Centre, Montreal, QC H4A 3J1, Canada
| | - Abdalwahab Kawaiah
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Chronic Viral Illness Service, Division of Infectious Diseases, Department of Medicine, McGill University Health Centre, Montreal, QC H4A 3J1, Canada
| | - Bertrand Lebouché
- Department of Family Medicine, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC H3S 1Z1, Canada; (M.L.)
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Infectious Diseases and Immunity in Global Health Program, Research Institute of McGill University Health Centre, Montreal, QC H4A 3S5, Canada
- Chronic Viral Illness Service, Division of Infectious Diseases, Department of Medicine, McGill University Health Centre, Montreal, QC H4A 3J1, Canada
| |
Collapse
|
7
|
Zhang F, Liu X, Wu W, Zhu S. Evolution of Chatbots in Nursing Education: Narrative Review. JMIR MEDICAL EDUCATION 2024; 10:e54987. [PMID: 38889074 PMCID: PMC11186796 DOI: 10.2196/54987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 05/16/2024] [Accepted: 05/22/2024] [Indexed: 06/20/2024]
Abstract
Background The integration of chatbots in nursing education is a rapidly evolving area with potential transformative impacts. This narrative review aims to synthesize and analyze the existing literature on chatbots in nursing education. Objective This study aims to comprehensively examine the temporal trends, international distribution, study designs, and implications of chatbots in nursing education. Methods A comprehensive search was conducted across 3 databases (PubMed, Web of Science, and Embase) following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram. Results A total of 40 articles met the eligibility criteria, with a notable increase of publications in 2023 (n=28, 70%). Temporal analysis revealed a notable surge in publications from 2021 to 2023, emphasizing the growing scholarly interest. Geographically, Taiwan province made substantial contributions (n=8, 20%), followed by the United States (n=6, 15%) and South Korea (n=4, 10%). Study designs varied, with reviews (n=8, 20%) and editorials (n=7, 18%) being predominant, showcasing the richness of research in this domain. Conclusions Integrating chatbots into nursing education presents a promising yet relatively unexplored avenue. This review highlights the urgent need for original research, emphasizing the importance of ethical considerations.
Collapse
Affiliation(s)
- Fang Zhang
- Department of Science and Education, Shenzhen Baoan Women's and Children's Hospital, Shenzhen, China
| | - Xiaoliu Liu
- Medical Laboratory of Shenzhen Luohu People’s Hospital, Shenzhen, China
| | - Wenyan Wu
- Medical Laboratory of Shenzhen Luohu People’s Hospital, Shenzhen, China
| | - Shiben Zhu
- School of Nursing and Health Studies, Hong Kong Metropolitan University, Hong Kong, China
| |
Collapse
|
8
|
Lucas F, Mackie I, d'Onofrio G, Frater JL. Responsible use of chatbots to advance the laboratory hematology scientific literature: Challenges and opportunities. Int J Lab Hematol 2024; 46 Suppl 1:9-11. [PMID: 38639069 DOI: 10.1111/ijlh.14285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 04/09/2024] [Indexed: 04/20/2024]
Affiliation(s)
- Fabienne Lucas
- Department of Pathology, University of Washington, Seattle, Washington, USA
| | - Ian Mackie
- Haemostasis Research Unit, University College London, London, UK
| | | | - John L Frater
- Department of Pathology and Immunology, Washington University, St Louis, Missouri, USA
| |
Collapse
|
9
|
Cheng J. Applications of Large Language Models in Pathology. Bioengineering (Basel) 2024; 11:342. [PMID: 38671764 PMCID: PMC11047860 DOI: 10.3390/bioengineering11040342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024] Open
Abstract
Large language models (LLMs) are transformer-based neural networks that can provide human-like responses to questions and instructions. LLMs can generate educational material, summarize text, extract structured data from free text, create reports, write programs, and potentially assist in case sign-out. LLMs combined with vision models can assist in interpreting histopathology images. LLMs have immense potential in transforming pathology practice and education, but these models are not infallible, so any artificial intelligence generated content must be verified with reputable sources. Caution must be exercised on how these models are integrated into clinical practice, as these models can produce hallucinations and incorrect results, and an over-reliance on artificial intelligence may lead to de-skilling and automation bias. This review paper provides a brief history of LLMs and highlights several use cases for LLMs in the field of pathology.
Collapse
Affiliation(s)
- Jerome Cheng
- Department of Pathology, University of Michigan, Ann Arbor, MI 48105, USA
| |
Collapse
|
10
|
Cung M, Sosa B, Yang HS, McDonald MM, Matthews BG, Vlug AG, Imel EA, Wein MN, Stein EM, Greenblatt MB. The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries. J Bone Miner Res 2024; 39:106-115. [PMID: 38477743 PMCID: PMC11184616 DOI: 10.1093/jbmr/zjad007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/30/2023] [Accepted: 11/14/2023] [Indexed: 03/14/2024]
Abstract
Artificial intelligence (AI) chatbots utilizing large language models (LLMs) have recently garnered significant interest due to their ability to generate humanlike responses to user inquiries in an interactive dialog format. While these models are being increasingly utilized to obtain medical information by patients, scientific and medical providers, and trainees to address biomedical questions, their performance may vary from field to field. The opportunities and risks these chatbots pose to the widespread understanding of skeletal health and science are unknown. Here we assess the performance of 3 high-profile LLM chatbots, Chat Generative Pre-Trained Transformer (ChatGPT) 4.0, BingAI, and Bard, to address 30 questions in 3 categories: basic and translational skeletal biology, clinical practitioner management of skeletal disorders, and patient queries to assess the accuracy and quality of the responses. Thirty questions in each of these categories were posed, and responses were independently graded for their degree of accuracy by four reviewers. While each of the chatbots was often able to provide relevant information about skeletal disorders, the quality and relevance of these responses varied widely, and ChatGPT 4.0 had the highest overall median score in each of the categories. Each of these chatbots displayed distinct limitations that included inconsistent, incomplete, or irrelevant responses, inappropriate utilization of lay sources in a professional context, a failure to take patient demographics or clinical context into account when providing recommendations, and an inability to consistently identify areas of uncertainty in the relevant literature. Careful consideration of both the opportunities and risks of current AI chatbots is needed to formulate guidelines for best practices for their use as source of information about skeletal health and biology.
Collapse
Affiliation(s)
- Michelle Cung
- Department of pathology and laboratory medicine, Weill Cornell Medical College, New York, NY
| | - Branden Sosa
- Department of pathology and laboratory medicine, Weill Cornell Medical College, New York, NY
| | - He S Yang
- Department of pathology and laboratory medicine, Weill Cornell Medical College, New York, NY
| | - Michelle M. McDonald
- Skeletal Diseases Program, The Garvan Institute of Medical Research, Darlinghurst, Australia
- St Vincent’s Clinical Campus School of Clinical Medicine, University of New South Wales, Kensington, Australia
- School of Medicine Science, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Brya G. Matthews
- Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand
- Center for Regenerative Medicine and Skeletal Development, School of Dental Medicine, UConn Health, Farmington, CT, United States
| | - Annegreet G. Vlug
- Center for Bone Quality, Department of Internal Medicine, Leiden University Medical Center, Leiden, The Netherlands
| | - Erik A. Imel
- Indiana Center for Musculoskeletal Health, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Marc N. Wein
- Endocrine Unit, Massachusetts General Hospital, Boston, MA
| | - Emily Margaret Stein
- Division of Endocrinology, Hospital for Special Surgery, New York, NY; Metabolic Bone Service, Hospital for Special Surgery
- Research Division, Hospital for Special Surgery, New York, NY
| | - Matthew B. Greenblatt
- Department of pathology and laboratory medicine, Weill Cornell Medical College, New York, NY
- Research Division, Hospital for Special Surgery, New York, NY
| |
Collapse
|
11
|
Rudroff T. Revealing the Complexity of Fatigue: A Review of the Persistent Challenges and Promises of Artificial Intelligence. Brain Sci 2024; 14:186. [PMID: 38391760 PMCID: PMC10886506 DOI: 10.3390/brainsci14020186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 01/31/2024] [Accepted: 02/16/2024] [Indexed: 02/24/2024] Open
Abstract
Part I reviews persistent challenges obstructing progress in understanding complex fatigue's biology. Difficulties quantifying subjective symptoms, mapping multi-factorial mechanisms, accounting for individual variation, enabling invasive sensing, overcoming research/funding insularity, and more are discussed. Part II explores how emerging artificial intelligence and machine and deep learning techniques can help address limitations through pattern recognition of complex physiological signatures as more objective biomarkers, predictive modeling to capture individual differences, consolidation of disjointed findings via data mining, and simulation to explore interventions. Conversational agents like Claude and ChatGPT also have potential to accelerate human fatigue research, but they currently lack capacities for robust autonomous contributions. Envisioned is an innovation timeline where synergistic application of enhanced neuroimaging, biosensors, closed-loop systems, and other advances combined with AI analytics could catalyze transformative progress in elucidating fatigue neural circuitry and treating associated conditions over the coming decades.
Collapse
Affiliation(s)
- Thorsten Rudroff
- Department of Health and Human Physiology, University of Iowa, Iowa City, IA 52242, USA
- Department of Neurology, University of Iowa Hospitals and Clinics, Iowa City, IA 52242, USA
| |
Collapse
|