1
|
De Vlieger G, Koyner JL, Ostermann M. Can we use artificial intelligence to better treat acute kidney injury? Intensive Care Med 2025; 51:160-162. [PMID: 39661141 DOI: 10.1007/s00134-024-07743-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Accepted: 11/22/2024] [Indexed: 12/12/2024]
Affiliation(s)
- Greet De Vlieger
- Laboratory of Intensive Care Medicine, Academic Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
- Department of Intensive Care Medicine, University Hospitals Leuven, Leuven, Belgium.
| | - Jay L Koyner
- Section of Nephrology, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Marlies Ostermann
- Department of Intensive Care, King's College London, Guy's & St Thomas' Hospital, London, UK
| |
Collapse
|
2
|
Liang Q, Xu X, Ding S, Wu J, Huang M. Prediction of successful weaning from renal replacement therapy in critically ill patients based on machine learning. Ren Fail 2024; 46:2319329. [PMID: 38416516 PMCID: PMC10903749 DOI: 10.1080/0886022x.2024.2319329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 02/10/2024] [Indexed: 02/29/2024] Open
Abstract
BACKGROUND Predicting the successful weaning of acute kidney injury (AKI) patients from renal replacement therapy (RRT) has emerged as a research focus, and we successfully built predictive models for RRT withdrawal in patients with severe AKI by machine learning. METHODS This retrospective single-center study utilized data from our general intensive care unit (ICU) Database, focusing on patients diagnosed with severe AKI who underwent RRT. We evaluated RRT weaning success based on patients being free of RRT in the subsequent week and their overall survival. Multiple logistic regression (MLR) and machine learning algorithms were adopted to construct the prediction models. RESULTS A total of 976 patients were included, with 349 patients successfully weaned off RRT. Longer RRT duration (7.0 vs. 9.6 d, p = 0.002, OR = 0.94), higher serum cystatin C levels (1.2 vs. 3.2 mg/L, p < 0.001, OR = 0.46), and the presence of septic shock (28.1% vs. 41.5%, p < 0.001, OR = 0.63) were associated with reduced likelihood of RRT weaning. Conversely, a positive furosemide stress test (FST) (60.2% vs. 40.7%, p < 0.001, OR = 2.75) and higher total urine volume 3 d before RRT withdrawal (755 vs. 125 mL/d, p < 0.001, OR = 2.12) were associated with an increased likelihood of successful weaning from RRT. Next, we demonstrated that machine learning models, especially Random Forest and XGBoost, achieving an AUROC of 0.95. The XGBoost model exhibited superior accuracy, yielding an AUROC of 0.849. CONCLUSION High-risk factors for unsuccessful RRT weaning in severe AKI patients include prolonged RRT duration. Machine learning prediction models, when compared to models based on multivariate logistic regression using these indicators, offer distinct advantages in predictive accuracy.
Collapse
Affiliation(s)
- Qiqiang Liang
- General Intensive Care Unit, Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, PR China
| | - Xin Xu
- General Intensive Care Unit, Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, PR China
| | - Shuo Ding
- General Intensive Care Unit, Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, PR China
| | - Jin Wu
- General Intensive Care Unit, Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, PR China
| | - Man Huang
- General Intensive Care Unit, Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, PR China
- Key Laboratory of Multiple Organ Failure, China National Ministry of Education, Hangzhou, PR China
| |
Collapse
|
3
|
Sheikh MS, Thongprayoon C, Suppadungsuk S, Miao J, Qureshi F, Kashani K, Cheungpasitporn W. Evaluating ChatGPT's Accuracy in Responding to Patient Education Questions on Acute Kidney Injury and Continuous Renal Replacement Therapy. Blood Purif 2024; 53:725-731. [PMID: 38679000 DOI: 10.1159/000539065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 04/19/2024] [Indexed: 05/01/2024]
Abstract
INTRODUCTION Acute kidney injury (AKI) and continuous renal replacement therapy (CRRT) are critical areas in nephrology. The effectiveness of ChatGPT in simpler, patient education-oriented questions has not been thoroughly assessed. This study evaluates the proficiency of ChatGPT 4.0 in responding to such questions, subjected to various linguistic alterations. METHODS Eighty-nine questions were sourced from the Mayo Clinic Handbook for educating patients on AKI and CRRT. These questions were categorized as original, paraphrased with different interrogative adverbs, paraphrased resulting in incomplete sentences, and paraphrased containing misspelled words. Two nephrologists verified the questions for medical accuracy. A χ2 test was conducted to ascertain notable discrepancies in ChatGPT 4.0's performance across these formats. RESULTS ChatGPT provided notable accuracy in handling a variety of question formats for patient education in AKI and CRRT. Across all question types, ChatGPT demonstrated an accuracy of 97% for both original and adverb-altered questions and 98% for questions with incomplete sentences or misspellings. Specifically for AKI-related questions, the accuracy was consistently maintained at 97% for all versions. In the subset of CRRT-related questions, the tool achieved a 96% accuracy for original and adverb-altered questions, and this increased to 98% for questions with incomplete sentences or misspellings. The statistical analysis revealed no significant difference in performance across these varied question types (p value: 1.00 for AKI and 1.00 for CRRT), and there was no notable disparity between the artificial intelligence (AI)'s responses to AKI and CRRT questions (p value: 0.71). CONCLUSION ChatGPT 4.0 demonstrates consistent and high accuracy in interpreting and responding to queries related to AKI and CRRT, irrespective of linguistic modifications. These findings suggest that ChatGPT 4.0 has the potential to be a reliable support tool in the delivery of patient education, by accurately providing information across a range of question formats. Further research is needed to explore the direct impact of AI-generated responses on patient understanding and education outcomes.
Collapse
Affiliation(s)
- Mohammad Salman Sheikh
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Supawadee Suppadungsuk
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Salaya, Thailand
| | - Jing Miao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Fawad Qureshi
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Kianoush Kashani
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
4
|
Beaulieu-Jones BR, Berrigan MT, Shah S, Marwaha JS, Lai SL, Brat GA. Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments. Surgery 2024; 175:936-942. [PMID: 38246839 PMCID: PMC10947829 DOI: 10.1016/j.surg.2023.12.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 12/09/2023] [Accepted: 12/15/2023] [Indexed: 01/23/2024]
Abstract
BACKGROUND Artificial intelligence has the potential to dramatically alter health care by enhancing how we diagnose and treat disease. One promising artificial intelligence model is ChatGPT, a general-purpose large language model trained by OpenAI. ChatGPT has shown human-level performance on several professional and academic benchmarks. We sought to evaluate its performance on surgical knowledge questions and assess the stability of this performance on repeat queries. METHODS We evaluated the performance of ChatGPT-4 on questions from the Surgical Council on Resident Education question bank and a second commonly used surgical knowledge assessment, referred to as Data-B. Questions were entered in 2 formats: open-ended and multiple-choice. ChatGPT outputs were assessed for accuracy and insights by surgeon evaluators. We categorized reasons for model errors and the stability of performance on repeat queries. RESULTS A total of 167 Surgical Council on Resident Education and 112 Data-B questions were presented to the ChatGPT interface. ChatGPT correctly answered 71.3% and 67.9% of multiple choice and 47.9% and 66.1% of open-ended questions for Surgical Council on Resident Education and Data-B, respectively. For both open-ended and multiple-choice questions, approximately two-thirds of ChatGPT responses contained nonobvious insights. Common reasons for incorrect responses included inaccurate information in a complex question (n = 16, 36.4%), inaccurate information in a fact-based question (n = 11, 25.0%), and accurate information with circumstantial discrepancy (n = 6, 13.6%). Upon repeat query, the answer selected by ChatGPT varied for 36.4% of questions answered incorrectly on the first query; the response accuracy changed for 6/16 (37.5%) questions. CONCLUSION Consistent with findings in other academic and professional domains, we demonstrate near or above human-level performance of ChatGPT on surgical knowledge questions from 2 widely used question banks. ChatGPT performed better on multiple-choice than open-ended questions, prompting questions regarding its potential for clinical application. Unique to this study, we demonstrate inconsistency in ChatGPT responses on repeat queries. This finding warrants future consideration including efforts at training large language models to provide the safe and consistent responses required for clinical application. Despite near or above human-level performance on question banks and given these observations, it is unclear whether large language models such as ChatGPT are able to safely assist clinicians in providing care.
Collapse
Affiliation(s)
- Brendin R Beaulieu-Jones
- Department of Surgery, Beth Israel Deaconess Medical Center, Boston, MA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA. https://twitter.com/bratogram
| | | | - Sahaj Shah
- Geisinger Commonwealth School of Medicine, Scranton, PA
| | - Jayson S Marwaha
- Division of Colorectal Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Shuo-Lun Lai
- Division of Colorectal Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Gabriel A Brat
- Department of Surgery, Beth Israel Deaconess Medical Center, Boston, MA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA.
| |
Collapse
|
5
|
Baldwin IC, McKaige A. Fluid Balance in Continuous Renal Replacement Therapy: Prescribing, Delivering, and Review. Blood Purif 2024; 53:533-540. [PMID: 38377974 DOI: 10.1159/000537928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 02/15/2024] [Indexed: 02/22/2024]
Abstract
BACKGROUND Historically IV and enteral fluids given during acute kidney injury (AKI) were restricted before the introduction of continuous renal replacement therapies (CRRTs) when more liberal fluids improved nutrition for the critically ill. However, fluid accumulation can occur when higher volumes each day are not considered in the fluid balance prescribing and the NET ultrafiltration (NUF) volume target. KEY MESSAGES The delivered hours of CRRT each day are vital for achievement of fluid balance and time off therapy makes the task more challenging. Clinicians inexperienced with CRRT make this aspect of AKI management a focus of rounding with senior oversight, clear communication, and "precision" a clinical target. Sepsis-associated AKI can be a complex patient where resuscitation and admission days are with a positive fluid load and replacement mind set. Subsequent days in ICU requires fluid regulation, removal, with a comprehensive multilayered assessment before prescribing the daily fluid balance target and the required hourly NET plasma water removal rate (NUF rate). Future machines may include advanced software, new alarms - display metrics, messages and association with machine learning and "AKI models" for setting, monitoring, and guaranteeing fluid removal. This could also link to current hardware such as on-line blood volume assessment with continuous haematocrit measurement. SUMMARY Fluid balance in the acutely ill is a challenge where forecasting and prediction are necessary. NUF rate and volume each hour should be tracked and adjusted to achieve the daily target. This requires human and machine connections.
Collapse
Affiliation(s)
- Ian Charles Baldwin
- Department of Intensive Care, Austin Hospital, Melbourne, Victoria, Australia
| | - Amy McKaige
- Department of Intensive Care, Austin Hospital, Melbourne, Victoria, Australia
| |
Collapse
|
6
|
Suppadungsuk S, Thongprayoon C, Miao J, Krisanapan P, Qureshi F, Kashani K, Cheungpasitporn W. Exploring the Potential of Chatbots in Critical Care Nephrology. MEDICINES (BASEL, SWITZERLAND) 2023; 10:58. [PMID: 37887265 PMCID: PMC10608511 DOI: 10.3390/medicines10100058] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 10/17/2023] [Accepted: 10/18/2023] [Indexed: 10/28/2023]
Abstract
The exponential growth of artificial intelligence (AI) has allowed for its integration into multiple sectors, including, notably, healthcare. Chatbots have emerged as a pivotal resource for improving patient outcomes and assisting healthcare practitioners through various AI-based technologies. In critical care, kidney-related conditions play a significant role in determining patient outcomes. This article examines the potential for integrating chatbots into the workflows of critical care nephrology to optimize patient care. We detail their specific applications in critical care nephrology, such as managing acute kidney injury, alert systems, and continuous renal replacement therapy (CRRT); facilitating discussions around palliative care; and bolstering collaboration within a multidisciplinary team. Chatbots have the potential to augment real-time data availability, evaluate renal health, identify potential risk factors, build predictive models, and monitor patient progress. Moreover, they provide a platform for enhancing communication and education for both patients and healthcare providers, paving the way for enriched knowledge and honed professional skills. However, it is vital to recognize the inherent challenges and limitations when using chatbots in this domain. Here, we provide an in-depth exploration of the concerns tied to chatbots' accuracy, dependability, data protection and security, transparency, potential algorithmic biases, and ethical implications in critical care nephrology. While human discernment and intervention are indispensable, especially in complex medical scenarios or intricate situations, the sustained advancements in AI signal that the integration of precision-engineered chatbot algorithms within critical care nephrology has considerable potential to elevate patient care and pivotal outcome metrics in the future.
Collapse
Affiliation(s)
- Supawadee Suppadungsuk
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
- Chakri Naruebodindra Medical Institute, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Samut Prakan 10540, Thailand
| | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Jing Miao
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Pajaree Krisanapan
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
- Division of Nephrology and Hypertension, Thammasat University Hospital, Pathum Thani 12120, Thailand
| | - Fawad Qureshi
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Kianoush Kashani
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Wisit Cheungpasitporn
- Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
7
|
Beaulieu-Jones BR, Shah S, Berrigan MT, Marwaha JS, Lai SL, Brat GA. Evaluating Capabilities of Large Language Models: Performance of GPT4 on Surgical Knowledge Assessments. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.16.23292743. [PMID: 37502981 PMCID: PMC10371188 DOI: 10.1101/2023.07.16.23292743] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Background Artificial intelligence (AI) has the potential to dramatically alter healthcare by enhancing how we diagnosis and treat disease. One promising AI model is ChatGPT, a large general-purpose language model trained by OpenAI. The chat interface has shown robust, human-level performance on several professional and academic benchmarks. We sought to probe its performance and stability over time on surgical case questions. Methods We evaluated the performance of ChatGPT-4 on two surgical knowledge assessments: the Surgical Council on Resident Education (SCORE) and a second commonly used knowledge assessment, referred to as Data-B. Questions were entered in two formats: open-ended and multiple choice. ChatGPT output were assessed for accuracy and insights by surgeon evaluators. We categorized reasons for model errors and the stability of performance on repeat encounters. Results A total of 167 SCORE and 112 Data-B questions were presented to the ChatGPT interface. ChatGPT correctly answered 71% and 68% of multiple-choice SCORE and Data-B questions, respectively. For both open-ended and multiple-choice questions, approximately two-thirds of ChatGPT responses contained non-obvious insights. Common reasons for inaccurate responses included: inaccurate information in a complex question (n=16, 36.4%); inaccurate information in fact-based question (n=11, 25.0%); and accurate information with circumstantial discrepancy (n=6, 13.6%). Upon repeat query, the answer selected by ChatGPT varied for 36.4% of inaccurate questions; the response accuracy changed for 6/16 questions. Conclusion Consistent with prior findings, we demonstrate robust near or above human-level performance of ChatGPT within the surgical domain. Unique to this study, we demonstrate a substantial inconsistency in ChatGPT responses with repeat query. This finding warrants future consideration and presents an opportunity to further train these models to provide safe and consistent responses. Without mental and/or conceptual models, it is unclear whether language models such as ChatGPT would be able to safely assist clinicians in providing care.
Collapse
Affiliation(s)
- Brendin R Beaulieu-Jones
- Department of Surgery, Beth Israel Deaconess Medical Center, Boston, MA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Sahaj Shah
- Geisinger Commonwealth School of Medicine, Scranton, PA
| | | | - Jayson S Marwaha
- Division of Colorectal Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Shuo-Lun Lai
- Division of Colorectal Surgery, National Taiwan University Hospital, Taipei, Taiwan
| | - Gabriel A Brat
- Department of Surgery, Beth Israel Deaconess Medical Center, Boston, MA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA
| |
Collapse
|
8
|
Liu LJ, Takeuchi T, Chen J, Neyra JA. Artificial Intelligence in Continuous Kidney Replacement Therapy. Clin J Am Soc Nephrol 2023; 18:671-674. [PMID: 36735382 PMCID: PMC10278853 DOI: 10.2215/cjn.0000000000000099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 01/20/2023] [Indexed: 02/04/2023]
Affiliation(s)
- Lucas J. Liu
- Department of Computer Science, University of Kentucky, Lexington, Kentucky
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky
| | - Tomonori Takeuchi
- Department of Health Policy and Informatics, Tokyo Medical and Dental University, Tokyo, Japan
| | - Jin Chen
- Department of Computer Science, University of Kentucky, Lexington, Kentucky
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky
| | - Javier A. Neyra
- Division of Nephrology, Bone and Mineral Metabolism, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky
- Division of Nephrology, Department of Internal Medicine, University of Alabama at Birmingham, Birmingham, Alabama
| |
Collapse
|