1
|
Anisha SA, Sen A, Bain C. Evaluating the Potential and Pitfalls of AI-Powered Conversational Agents as Humanlike Virtual Health Carers in the Remote Management of Noncommunicable Diseases: Scoping Review. J Med Internet Res 2024; 26:e56114. [PMID: 39012688 DOI: 10.2196/56114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 03/06/2024] [Accepted: 03/25/2024] [Indexed: 07/17/2024] Open
Abstract
BACKGROUND The rising prevalence of noncommunicable diseases (NCDs) worldwide and the high recent mortality rates (74.4%) associated with them, especially in low- and middle-income countries, is causing a substantial global burden of disease, necessitating innovative and sustainable long-term care solutions. OBJECTIVE This scoping review aims to investigate the impact of artificial intelligence (AI)-based conversational agents (CAs)-including chatbots, voicebots, and anthropomorphic digital avatars-as human-like health caregivers in the remote management of NCDs as well as identify critical areas for future research and provide insights into how these technologies might be used effectively in health care to personalize NCD management strategies. METHODS A broad literature search was conducted in July 2023 in 6 electronic databases-Ovid MEDLINE, Embase, PsycINFO, PubMed, CINAHL, and Web of Science-using the search terms "conversational agents," "artificial intelligence," and "noncommunicable diseases," including their associated synonyms. We also manually searched gray literature using sources such as ProQuest Central, ResearchGate, ACM Digital Library, and Google Scholar. We included empirical studies published in English from January 2010 to July 2023 focusing solely on health care-oriented applications of CAs used for remote management of NCDs. The narrative synthesis approach was used to collate and summarize the relevant information extracted from the included studies. RESULTS The literature search yielded a total of 43 studies that matched the inclusion criteria. Our review unveiled four significant findings: (1) higher user acceptance and compliance with anthropomorphic and avatar-based CAs for remote care; (2) an existing gap in the development of personalized, empathetic, and contextually aware CAs for effective emotional and social interaction with users, along with limited consideration of ethical concerns such as data privacy and patient safety; (3) inadequate evidence of the efficacy of CAs in NCD self-management despite a moderate to high level of optimism among health care professionals regarding CAs' potential in remote health care; and (4) CAs primarily being used for supporting nonpharmacological interventions such as behavioral or lifestyle modifications and patient education for the self-management of NCDs. CONCLUSIONS This review makes a unique contribution to the field by not only providing a quantifiable impact analysis but also identifying the areas requiring imminent scholarly attention for the ethical, empathetic, and efficacious implementation of AI in NCD care. This serves as an academic cornerstone for future research in AI-assisted health care for NCD management. TRIAL REGISTRATION Open Science Framework; https://doi.org/10.17605/OSF.IO/GU5PX.
Collapse
Affiliation(s)
- Sadia Azmin Anisha
- Jeffrey Cheah School of Medicine & Health Sciences, Monash University Malaysia, Bandar Sunway, Malaysia
| | - Arkendu Sen
- Jeffrey Cheah School of Medicine & Health Sciences, Monash University Malaysia, Bandar Sunway, Malaysia
| | - Chris Bain
- Faculty of Information Technology, Data Future Institutes, Monash University, Clayton, Australia
| |
Collapse
|
2
|
Reddy AT, Patel A, Leiman DA. Automated software-derived supine baseline impedance is highly correlated with manual nocturnal baseline impedance for the diagnosis of GERD. Neurogastroenterol Motil 2024:e14861. [PMID: 38988098 DOI: 10.1111/nmo.14861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 06/24/2024] [Accepted: 06/26/2024] [Indexed: 07/12/2024]
Abstract
BACKGROUND Mean nocturnal baseline impedance (MNBI) can improve diagnostic accuracy for gastroesophageal reflux disease (GERD), but must be manually calculated and is not routinely reported. We aimed to determine how automated software-derived mean supine baseline impedance (MSBI), a potential novel GERD metric, is related to MNBI. METHODS Consecutively obtained pH-impedance studies were assessed. Manually extracted MNBI was compared to MSBI using paired t-test and Spearman's correlations. KEY RESULTS The correlation between MNBI and MSBI was very high (ρ = 0.966, p < 0.01). CONCLUSIONS & INFERENCES The ease of acquisition and correlation with MNBI warrant the routine clinical use and reporting of MSBI with pH-impedance studies.
Collapse
Affiliation(s)
- Alexander T Reddy
- Division of Gastroenterology, Duke University Medical Center, Durham, North Carolina, USA
| | - Amit Patel
- Division of Gastroenterology, Duke University Medical Center, Durham, North Carolina, USA
- Durham VA Medical Center, Durham, North Carolina, USA
| | - David A Leiman
- Division of Gastroenterology, Duke University Medical Center, Durham, North Carolina, USA
- Duke Clinical Research Institute, Durham, North Carolina, USA
| |
Collapse
|
3
|
Giuffrè M, Kresevic S, You K, Dupont J, Huebner J, Grimshaw AA, Shung DL. Systematic review: The use of large language models as medical chatbots in digestive diseases. Aliment Pharmacol Ther 2024; 60:144-166. [PMID: 38798194 DOI: 10.1111/apt.18058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 01/23/2024] [Accepted: 05/11/2024] [Indexed: 05/29/2024]
Abstract
BACKGROUND Interest in large language models (LLMs), such as OpenAI's ChatGPT, across multiple specialties has grown as a source of patient-facing medical advice and provider-facing clinical decision support. The accuracy of LLM responses for gastroenterology and hepatology-related questions is unknown. AIMS To evaluate the accuracy and potential safety implications for LLMs for the diagnosis, management and treatment of questions related to gastroenterology and hepatology. METHODS We conducted a systematic literature search including Cochrane Library, Google Scholar, Ovid Embase, Ovid MEDLINE, PubMed, Scopus and the Web of Science Core Collection to identify relevant articles published from inception until January 28, 2024, using a combination of keywords and controlled vocabulary for LLMs and gastroenterology or hepatology. Accuracy was defined as the percentage of entirely correct answers. RESULTS Among the 1671 reports screened, we identified 33 full-text articles on using LLMs in gastroenterology and hepatology and included 18 in the final analysis. The accuracy of question-responding varied across different model versions. For example, accuracy ranged from 6.4% to 45.5% with ChatGPT-3.5 and was between 40% and 91.4% with ChatGPT-4. In addition, the absence of standardised methodology and reporting metrics for studies involving LLMs places all the studies at a high risk of bias and does not allow for the generalisation of single-study results. CONCLUSIONS Current general-purpose LLMs have unacceptably low accuracy on clinical gastroenterology and hepatology tasks, which may lead to adverse patient safety events through incorrect information or triage recommendations, which might overburden healthcare systems or delay necessary care.
Collapse
Affiliation(s)
- Mauro Giuffrè
- Department of Internal Medicine (Digestive Diseases), Yale School of Medicine, New Haven, Connecticut, USA
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
| | - Simone Kresevic
- Department of Engineering and Architecture, University of Trieste, Trieste, Italy
| | - Kisung You
- Department of Mathematics at Baruch College, City University of new York, New York, New York, USA
| | - Johannes Dupont
- Department of Internal Medicine (Digestive Diseases), Yale School of Medicine, New Haven, Connecticut, USA
| | - Jack Huebner
- Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Alyssa Ann Grimshaw
- Research & Education Librarian (Clinical) at Harvey Cushing/John Hay Whitney Medical Library, Yale University, New Haven, Connecticut, USA
| | - Dennis Legen Shung
- Department of Internal Medicine (Digestive Diseases), Yale School of Medicine, New Haven, Connecticut, USA
| |
Collapse
|
4
|
Aburumman R, Al Annan K, Mrad R, Brunaldi VO, Gala K, Abu Dayyeh BK. Assessing ChatGPT vs. Standard Medical Resources for Endoscopic Sleeve Gastroplasty Education: A Medical Professional Evaluation Study. Obes Surg 2024; 34:2718-2724. [PMID: 38758515 DOI: 10.1007/s11695-024-07283-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 05/07/2024] [Accepted: 05/09/2024] [Indexed: 05/18/2024]
Abstract
BACKGROUND AND AIMS The Chat Generative Pre-Trained Transformer (ChatGPT) represents a significant advancement in artificial intelligence (AI) chatbot technology. While ChatGPT offers promising capabilities, concerns remain about its reliability and accuracy. This study aims to evaluate ChatGPT's responses to patients' frequently asked questions about Endoscopic Sleeve Gastroplasty (ESG). METHODS Expert Gastroenterologists and Bariatric Surgeons, with experience in ESG, were invited to evaluate ChatGPT-generated answers to eight ESG-related questions, and answers sourced from hospital websites. The evaluation criteria included ease of understanding, scientific accuracy, and overall answer satisfaction. They were also tasked with discerning whether each response was AI generated or not. RESULTS Twelve medical professionals with expertise in ESG participated, 83.3% of whom had experience performing the procedure independently. The entire cohort possessed substantial knowledge about ESG. ChatGPT's utility among participants, rated on a scale of one to five, averaged 2.75. The raters demonstrated a 54% accuracy rate in distinguishing AI-generated responses, with a sensitivity of 39% and specificity of 60%, resulting in an average of 17.6 correct identifications out of a possible 31. Overall, there were no significant differences between AI-generated and non-AI responses in terms of scientific accuracy, understandability, and satisfaction, with one notable exception. For the question defining ESG, the AI-generated definition scored higher in scientific accuracy (4.33 vs. 3.61, p = 0.007) and satisfaction (4.33 vs. 3.58, p = 0.009) compared to the non-AI versions. CONCLUSIONS This study underscores ChatGPT's efficacy in providing medical information on ESG, demonstrating its comparability to traditional sources in scientific accuracy.
Collapse
Affiliation(s)
- Razan Aburumman
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Karim Al Annan
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Rudy Mrad
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Vitor O Brunaldi
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Khushboo Gala
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Barham K Abu Dayyeh
- Division of Gastroenterology and Hepatology, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.
| |
Collapse
|
5
|
Giuffrè M, Kresevic S, Pugliese N, You K, Shung DL. Optimizing large language models in digestive disease: strategies and challenges to improve clinical outcomes. Liver Int 2024. [PMID: 38819632 DOI: 10.1111/liv.15974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/26/2024] [Accepted: 05/02/2024] [Indexed: 06/01/2024]
Abstract
Large Language Models (LLMs) are transformer-based neural networks with billions of parameters trained on very large text corpora from diverse sources. LLMs have the potential to improve healthcare due to their capability to parse complex concepts and generate context-based responses. The interest in LLMs has not spared digestive disease academics, who have mainly investigated foundational LLM accuracy, which ranges from 25% to 90% and is influenced by the lack of standardized rules to report methodologies and results for LLM-oriented research. In addition, a critical issue is the absence of a universally accepted definition of accuracy, varying from binary to scalar interpretations, often tied to grader expertise without reference to clinical guidelines. We address strategies and challenges to increase accuracy. In particular, LLMs can be infused with domain knowledge using Retrieval Augmented Generation (RAG) or Supervised Fine-Tuning (SFT) with reinforcement learning from human feedback (RLHF). RAG faces challenges with in-context window limits and accurate information retrieval from the provided context. SFT, a deeper adaptation method, is computationally demanding and requires specialized knowledge. LLMs may increase patient quality of care across the field of digestive diseases, where physicians are often engaged in screening, treatment and surveillance for a broad range of pathologies for which in-context learning or SFT with RLHF could improve clinical decision-making and patient outcomes. However, despite their potential, the safe deployment of LLMs in healthcare still needs to overcome hurdles in accuracy, suggesting a need for strategies that integrate human feedback with advanced model training.
Collapse
Affiliation(s)
- Mauro Giuffrè
- Department of Internal Medicine (Digestive Diseases), Yale School of Medicine, New Haven, Connecticut, USA
- Department of Medical, Surgical, and Health Sciences, University of Trieste, Trieste, Italy
| | - Simone Kresevic
- Department of Engineering and Architecture, University of Trieste, Trieste, Italy
| | - Nicola Pugliese
- Division of Internal Medicine and Hepatology, Department of Gastroenterology, IRCCS Humanitas Research Hospital, Rozzano, Italy
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Italy
| | - Kisung You
- Department of Mathematics, Baruch College, City University of New York, New York, New York, USA
| | - Dennis L Shung
- Department of Internal Medicine (Digestive Diseases), Yale School of Medicine, New Haven, Connecticut, USA
| |
Collapse
|
6
|
Huo B, Calabrese E, Sylla P, Kumar S, Ignacio RC, Oviedo R, Hassan I, Slater BJ, Kaiser A, Walsh DS, Vosburg W. The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease. Surg Endosc 2024; 38:2320-2330. [PMID: 38630178 DOI: 10.1007/s00464-024-10807-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/21/2024] [Indexed: 07/11/2024]
Abstract
BACKGROUND Large language model (LLM)-linked chatbots may be an efficient source of clinical recommendations for healthcare providers and patients. This study evaluated the performance of LLM-linked chatbots in providing recommendations for the surgical management of gastroesophageal reflux disease (GERD). METHODS Nine patient cases were created based on key questions addressed by the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) guidelines for the surgical treatment of GERD. ChatGPT-3.5, ChatGPT-4, Copilot, Google Bard, and Perplexity AI were queried on November 16th, 2023, for recommendations regarding the surgical management of GERD. Accurate chatbot performance was defined as the number of responses aligning with SAGES guideline recommendations. Outcomes were reported with counts and percentages. RESULTS Surgeons were given accurate recommendations for the surgical management of GERD in an adult patient for 5/7 (71.4%) KQs by ChatGPT-4, 3/7 (42.9%) KQs by Copilot, 6/7 (85.7%) KQs by Google Bard, and 3/7 (42.9%) KQs by Perplexity according to the SAGES guidelines. Patients were given accurate recommendations for 3/5 (60.0%) KQs by ChatGPT-4, 2/5 (40.0%) KQs by Copilot, 4/5 (80.0%) KQs by Google Bard, and 1/5 (20.0%) KQs by Perplexity, respectively. In a pediatric patient, surgeons were given accurate recommendations for 2/3 (66.7%) KQs by ChatGPT-4, 3/3 (100.0%) KQs by Copilot, 3/3 (100.0%) KQs by Google Bard, and 2/3 (66.7%) KQs by Perplexity. Patients were given appropriate guidance for 2/2 (100.0%) KQs by ChatGPT-4, 2/2 (100.0%) KQs by Copilot, 1/2 (50.0%) KQs by Google Bard, and 1/2 (50.0%) KQs by Perplexity. CONCLUSIONS Gastrointestinal surgeons, gastroenterologists, and patients should recognize both the promise and pitfalls of LLM's when utilized for advice on surgical management of GERD. Additional training of LLM's using evidence-based health information is needed.
Collapse
Affiliation(s)
- Bright Huo
- Division of General Surgery, Department of Surgery, McMaster University, Hamilton, ON, Canada
| | - Elisa Calabrese
- University of California South California, East Bay, Oakland, CA, USA
| | - Patricia Sylla
- Division of Colon and Rectal Surgery, Department of Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sunjay Kumar
- Department of General Surgery, Thomas Jefferson University Hospital, Philadelphia, PA, USA
| | - Romeo C Ignacio
- Division of Pediatric Surgery/Department of Surgery, San Diego School of Medicine, University of California, California, CA, USA
| | - Rodolfo Oviedo
- Nacogdoches Center for Metabolic and Weight Loss Surgery, Nacogdoches, TX, USA
- University of Houston Tilman J. Fertitta Family College of Medicine, Houston, TX, USA
- Sam Houston State University College of Osteopathic Medicine, Conroe, TX, USA
| | | | | | - Andreas Kaiser
- Division of Colorectal Surgery, Department of Surgery, City of Hope National Medical Center, Duarte, CA, USA
| | - Danielle S Walsh
- Department of Surgery, University of Kentucky, Lexington, KY, USA
| | - Wesley Vosburg
- Department of Surgery, Harvard Medical School, Mount Auburn Hospital, Cambridge, MA, USA.
| |
Collapse
|
7
|
Kresevic S, Giuffrè M, Ajcevic M, Accardo A, Crocè LS, Shung DL. Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework. NPJ Digit Med 2024; 7:102. [PMID: 38654102 DOI: 10.1038/s41746-024-01091-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 03/29/2024] [Indexed: 04/25/2024] Open
Abstract
Large language models (LLMs) can potentially transform healthcare, particularly in providing the right information to the right provider at the right time in the hospital workflow. This study investigates the integration of LLMs into healthcare, specifically focusing on improving clinical decision support systems (CDSSs) through accurate interpretation of medical guidelines for chronic Hepatitis C Virus infection management. Utilizing OpenAI's GPT-4 Turbo model, we developed a customized LLM framework that incorporates retrieval augmented generation (RAG) and prompt engineering. Our framework involved guideline conversion into the best-structured format that can be efficiently processed by LLMs to provide the most accurate output. An ablation study was conducted to evaluate the impact of different formatting and learning strategies on the LLM's answer generation accuracy. The baseline GPT-4 Turbo model's performance was compared against five experimental setups with increasing levels of complexity: inclusion of in-context guidelines, guideline reformatting, and implementation of few-shot learning. Our primary outcome was the qualitative assessment of accuracy based on expert review, while secondary outcomes included the quantitative measurement of similarity of LLM-generated responses to expert-provided answers using text-similarity scores. The results showed a significant improvement in accuracy from 43 to 99% (p < 0.001), when guidelines were provided as context in a coherent corpus of text and non-text sources were converted into text. In addition, few-shot learning did not seem to improve overall accuracy. The study highlights that structured guideline reformatting and advanced prompt engineering (data quality vs. data quantity) can enhance the efficacy of LLM integrations to CDSSs for guideline delivery.
Collapse
Affiliation(s)
- Simone Kresevic
- Department of Engineering and Architecture, University of Trieste, Trieste, Italy.
- Department of Medicine (Digestive Diseases), Yale School of Medicine, Yale University, New Haven, CT, USA.
| | - Mauro Giuffrè
- Department of Medicine (Digestive Diseases), Yale School of Medicine, Yale University, New Haven, CT, USA.
| | - Milos Ajcevic
- Department of Engineering and Architecture, University of Trieste, Trieste, Italy
| | - Agostino Accardo
- Department of Engineering and Architecture, University of Trieste, Trieste, Italy
| | - Lory S Crocè
- Department of Medical, Surgical, and Health Sciences, University of Trieste, Trieste, Italy
| | - Dennis L Shung
- Department of Medicine (Digestive Diseases), Yale School of Medicine, Yale University, New Haven, CT, USA
| |
Collapse
|
8
|
Ghersin I, Weisshof R, Koifman E, Bar-Yoseph H, Ben Hur D, Maza I, Hasnis E, Nasser R, Ovadia B, Dror Zur D, Waterman M, Gorelik Y. Comparative evaluation of a language model and human specialists in the application of European guidelines for the management of inflammatory bowel diseases and malignancies. Endoscopy 2024. [PMID: 38499197 DOI: 10.1055/a-2289-5732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
BACKGROUND Society guidelines on colorectal dysplasia screening, surveillance, and endoscopic management in inflammatory bowel disease (IBD) are complex, and physician adherence to them is suboptimal. We aimed to evaluate the use of ChatGPT, a large language model, in generating accurate guideline-based recommendations for colorectal dysplasia screening, surveillance, and endoscopic management in IBD in line with European Crohn's and Colitis Organization (ECCO) guidelines. METHODS 30 clinical scenarios in the form of free text were prepared and presented to three separate sessions of ChatGPT and to eight gastroenterologists (four IBD specialists and four non-IBD gastroenterologists). Two additional IBD specialists subsequently assessed all responses provided by ChatGPT and the eight gastroenterologists, judging their accuracy according to ECCO guidelines. RESULTS ChatGPT had a mean correct response rate of 87.8%. Among the eight gastroenterologists, the mean correct response rates were 85.8% for IBD experts and 89.2% for non-IBD experts. No statistically significant differences in accuracy were observed between ChatGPT and all gastroenterologists (P=0.95), or between ChatGPT and the IBD experts and non-IBD expert gastroenterologists, respectively (P=0.82). CONCLUSIONS This study highlights the potential of language models in enhancing guideline adherence regarding colorectal dysplasia in IBD. Further investigation of additional resources and prospective evaluation in real-world settings are warranted.
Collapse
Affiliation(s)
- Itai Ghersin
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
| | - Roni Weisshof
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
| | - Eduard Koifman
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
| | - Haggai Bar-Yoseph
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
- Rappaport Faculty of Medicine, Technion, Israel Institute of Technology, Haifa, Israel
| | - Dana Ben Hur
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
| | - Itay Maza
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
| | - Erez Hasnis
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
| | - Roni Nasser
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
| | - Baruch Ovadia
- Department of Gastroenterology and Hepatology, Hillel Yaffe Medical Center, Hadera, Israel
| | - Dikla Dror Zur
- Department of Gastroenterology, Galilee Medical Center, Nahariya, Israel
| | - Matti Waterman
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
- Rappaport Faculty of Medicine, Technion, Israel Institute of Technology, Haifa, Israel
| | - Yuri Gorelik
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
| |
Collapse
|
9
|
Parikh AO, Oca MC, Conger JR, McCoy A, Chang J, Zhang-Nunes S. Accuracy and Bias in Artificial Intelligence Chatbot Recommendations for Oculoplastic Surgeons. Cureus 2024; 16:e57611. [PMID: 38707042 PMCID: PMC11069401 DOI: 10.7759/cureus.57611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2024] [Indexed: 05/07/2024] Open
Abstract
Purpose The purpose of this study is to assess the accuracy of and bias in recommendations for oculoplastic surgeons from three artificial intelligence (AI) chatbot systems. Methods ChatGPT, Microsoft Bing Balanced, and Google Bard were asked for recommendations for oculoplastic surgeons practicing in 20 cities with the highest population in the United States. Three prompts were used: "can you help me find (an oculoplastic surgeon)/(a doctor who does eyelid lifts)/(an oculofacial plastic surgeon) in (city)." Results A total of 672 suggestions were made between (oculoplastic surgeon; doctor who does eyelid lifts; oculofacial plastic surgeon); 19.8% suggestions were excluded, leaving 539 suggested physicians. Of these, 64.1% were oculoplastics specialists (of which 70.1% were American Society of Ophthalmic Plastic and Reconstructive Surgery (ASOPRS) members); 16.1% were general plastic surgery trained, 9.0% were ENT trained, 8.8% were ophthalmology but not oculoplastics trained, and 1.9% were trained in another specialty. 27.7% of recommendations across all AI systems were female. Conclusions Among the chatbot systems tested, there were high rates of inaccuracy: up to 38% of recommended surgeons were nonexistent or not practicing in the city requested, and 35.9% of those recommended as oculoplastic/oculofacial plastic surgeons were not oculoplastics specialists. Choice of prompt affected the result, with requests for "a doctor who does eyelid lifts" resulting in more plastic surgeons and ENTs and fewer oculoplastic surgeons. It is important to identify inaccuracies and biases in recommendations provided by AI systems as more patients may start using them to choose a surgeon.
Collapse
Affiliation(s)
- Alomi O Parikh
- Ophthalmology, USC Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, USA
| | - Michael C Oca
- Ophthalmology, University of California San Diego School of Medicine, La Jolla, USA
| | - Jordan R Conger
- Oculofacial Plastic Surgery, USC Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, USA
| | - Allison McCoy
- Oculofacial Plastic Surgery, Del Mar Plastic Surgery, San Diego, USA
| | - Jessica Chang
- Oculofacial Plastic Surgery, USC Roski Eye Institute, Keck School Medicine, University of Southern California, Los Angeles, USA
| | - Sandy Zhang-Nunes
- Ophthalmology, USC Roski Eye Institute, Keck School Medicine, University of Southern California, Los Angeles, USA
| |
Collapse
|
10
|
Fass O, Rogers BD, Gyawali CP. Artificial Intelligence Tools for Improving Manometric Diagnosis of Esophageal Dysmotility. Curr Gastroenterol Rep 2024; 26:115-123. [PMID: 38324172 PMCID: PMC10960670 DOI: 10.1007/s11894-024-00921-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/23/2024] [Indexed: 02/08/2024]
Abstract
PURPOSE OF REVIEW Artificial intelligence (AI) is a broad term that pertains to a computer's ability to mimic and sometimes surpass human intelligence in interpretation of large datasets. The adoption of AI in gastrointestinal motility has been slower compared to other areas such as polyp detection and interpretation of histopathology. RECENT FINDINGS Within esophageal physiologic testing, AI can automate interpretation of image-based tests, especially high resolution manometry (HRM) and functional luminal imaging probe (FLIP) studies. Basic tasks such as identification of landmarks, determining adequacy of the HRM study and identification from achalasia from non-achalasia patterns are achieved with good accuracy. However, existing AI systems compare AI interpretation to expert analysis rather than to clinical outcome from management based on AI diagnosis. The use of AI methods is much less advanced within the field of ambulatory reflux monitoring, where challenges exist in assimilation of data from multiple impedance and pH channels. There remains potential for replication of the AI successes within esophageal physiologic testing to HRM of the anorectum, and to innovative and novel methods of evaluating gastric electrical activity and motor function. The use of AI has tremendous potential to improve detection of dysmotility within the esophagus using esophageal physiologic testing, as well as in other regions of the gastrointestinal tract. Eventually, integration of patient presentation, demographics and alternate test results to individual motility test interpretation will improve diagnostic precision and prognostication using AI tools.
Collapse
Affiliation(s)
- Ofer Fass
- Division of Gastroenterology and Hepatology, Stanford University, Stanford, CA, USA
| | - Benjamin D Rogers
- Division of Gastroenterology, Hepatology and Nutrition, University of Louisville School of Medicine, Louisville, KY, USA
- Division of Gastroenterology, Washington University School of Medicine, 660 South Euclid Ave., Campus Box 8124, Saint Louis, MO, 63110, USA
| | - C Prakash Gyawali
- Division of Gastroenterology, Washington University School of Medicine, 660 South Euclid Ave., Campus Box 8124, Saint Louis, MO, 63110, USA.
| |
Collapse
|
11
|
Klang E, Sourosh A, Nadkarni GN, Sharif K, Lahat A. Evaluating the role of ChatGPT in gastroenterology: a comprehensive systematic review of applications, benefits, and limitations. Therap Adv Gastroenterol 2023; 16:17562848231218618. [PMID: 38149123 PMCID: PMC10750546 DOI: 10.1177/17562848231218618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 11/16/2023] [Indexed: 12/28/2023] Open
Abstract
Background The integration of artificial intelligence (AI) into healthcare has opened new avenues for enhancing patient care and clinical research. In gastroenterology, the potential of AI tools, specifically large language models like ChatGPT, is being explored to understand their utility and effectiveness. Objectives The primary goal of this systematic review is to assess the various applications, ascertain the benefits, and identify the limitations of utilizing ChatGPT within the realm of gastroenterology. Design Through a systematic approach, this review aggregates findings from multiple studies to evaluate the impact of ChatGPT on the field. Data sources and methods The review was based on a detailed literature search of the PubMed database, targeting research that delves into the use of ChatGPT for gastroenterological purposes. It incorporated six selected studies, which were meticulously evaluated for quality using the Joanna Briggs Institute critical appraisal instruments. The data were then synthesized narratively to encapsulate the roles, advantages, and drawbacks of ChatGPT in gastroenterology. Results The investigation unearthed various roles of ChatGPT, including its use in patient education, diagnostic self-assessment, disease management, and the formulation of research queries. Notable benefits were its capability to provide pertinent recommendations, enhance communication between patients and physicians, and prompt valuable research inquiries. Nonetheless, it encountered obstacles in decoding intricate medical questions, yielded inconsistent responses at times, and exhibited limitations in generating novel content. The review also considered ethical implications. Conclusion ChatGPT has demonstrated significant potential in the field of gastroenterology, especially in facilitating patient-physician interactions and managing diseases. Despite these advancements, the review underscores the necessity for ongoing refinement, customization, and ethical regulation of AI tools. These findings serve to enrich the dialog concerning AI's role in healthcare, with a specific focus on ChatGPT's application in gastroenterology.
Collapse
Affiliation(s)
- Eyal Klang
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- ARC Innovation Center, Sheba Medical Center at Tel Hashomer Affiliated with Tel Aviv Medical School, Tel Aviv University, Tel Aviv, Israel
| | - Ali Sourosh
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Girish N. Nadkarni
- Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kassem Sharif
- Department of Gastroenterology, Sheba Medical Center at Tel Hashomer Affiliated with Tel Aviv Medical School, Tel Aviv University, Tel Aviv, Israel
| | - Adi Lahat
- Department of Gastroenterology, Sheba Medical Center at Tel Hashomer, Ramat Gan, 52621 Affiliated with Tel Aviv Medical School, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
12
|
Dang F, Samarasena JB. Generative Artificial Intelligence for Gastroenterology: Neither Friend nor Foe. Am J Gastroenterol 2023; 118:2146-2147. [PMID: 38033225 DOI: 10.14309/ajg.0000000000002573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023]
Affiliation(s)
- Frances Dang
- Division of Gastroenterology/Hepatology, University of California Irvine School of Medicine, Orange, California, USA
| | | |
Collapse
|