1
|
Petrella RJ. The AI Future of Emergency Medicine. Ann Emerg Med 2024; 84:139-153. [PMID: 38795081 DOI: 10.1016/j.annemergmed.2024.01.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/23/2024] [Accepted: 01/24/2024] [Indexed: 05/27/2024]
Abstract
In the coming years, artificial intelligence (AI) and machine learning will likely give rise to profound changes in the field of emergency medicine, and medicine more broadly. This article discusses these anticipated changes in terms of 3 overlapping yet distinct stages of AI development. It reviews some fundamental concepts in AI and explores their relation to clinical practice, with a focus on emergency medicine. In addition, it describes some of the applications of AI in disease diagnosis, prognosis, and treatment, as well as some of the practical issues that they raise, the barriers to their implementation, and some of the legal and regulatory challenges they create.
Collapse
Affiliation(s)
- Robert J Petrella
- Emergency Departments, CharterCARE Health Partners, Providence and North Providence, RI; Emergency Department, Boston VA Medical Center, Boston, MA; Emergency Departments, Steward Health Care System, Boston and Methuen, MA; Harvard Medical School, Boston, MA; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA; Department of Medicine, Brigham and Women's Hospital, Boston, MA.
| |
Collapse
|
2
|
Kachman MM, Brennan I, Oskvarek JJ, Waseem T, Pines JM. How artificial intelligence could transform emergency care. Am J Emerg Med 2024; 81:40-46. [PMID: 38663302 DOI: 10.1016/j.ajem.2024.04.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/13/2024] [Accepted: 04/15/2024] [Indexed: 06/07/2024] Open
Abstract
Artificial intelligence (AI) in healthcare is the ability of a computer to perform tasks typically associated with clinical care (e.g. medical decision-making and documentation). AI will soon be integrated into an increasing number of healthcare applications, including elements of emergency department (ED) care. Here, we describe the basics of AI, various categories of its functions (including machine learning and natural language processing) and review emerging and potential future use-cases for emergency care. For example, AI-assisted symptom checkers could help direct patients to the appropriate setting, models could assist in assigning triage levels, and ambient AI systems could document clinical encounters. AI could also help provide focused summaries of charts, summarize encounters for hand-offs, and create discharge instructions with an appropriate language and reading level. Additional use cases include medical decision making for decision rules, real-time models that predict clinical deterioration or sepsis, and efficient extraction of unstructured data for coding, billing, research, and quality initiatives. We discuss the potential transformative benefits of AI, as well as the concerns regarding its use (e.g. privacy, data accuracy, and the potential for changing the doctor-patient relationship).
Collapse
Affiliation(s)
- Marika M Kachman
- US Acute Care Solutions, Canton, OH, United States of America; Department of Emergency Medicine, Virginia Hospital Center, Arlington, VA, United States of America
| | - Irina Brennan
- US Acute Care Solutions, Canton, OH, United States of America; Department of Emergency Medicine, Inova Alexandria Hospital, Alexandria, VA, United States of America
| | - Jonathan J Oskvarek
- US Acute Care Solutions, Canton, OH, United States of America; Department of Emergency Medicine, Summa Health, Akron, OH, United States of America
| | - Tayab Waseem
- Department of Emergency Medicine, George Washington University, Washington, DC, United States of America
| | - Jesse M Pines
- US Acute Care Solutions, Canton, OH, United States of America; Department of Emergency Medicine, George Washington University, Washington, DC, United States of America.
| |
Collapse
|
3
|
Rutledge GW. Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases. Learn Health Syst 2024; 8:e10438. [PMID: 39036534 PMCID: PMC11257049 DOI: 10.1002/lrh2.10438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/16/2024] [Accepted: 05/19/2024] [Indexed: 07/23/2024] Open
Abstract
Introduction Large language models (LLMs) have a high diagnostic accuracy when they evaluate previously published clinical cases. Methods We compared the accuracy of GPT-4's differential diagnoses for previously unpublished challenging case scenarios with the diagnostic accuracy for previously published cases. Results For a set of previously unpublished challenging clinical cases, GPT-4 achieved 61.1% correct in its top 6 diagnoses versus the previously reported 49.1% for physicians. For a set of 45 clinical vignettes of more common clinical scenarios, GPT-4 included the correct diagnosis in its top 3 diagnoses 100% of the time versus the previously reported 84.3% for physicians. Conclusions GPT-4 performs at a level at least as good as, if not better than, that of experienced physicians on highly challenging cases in internal medicine. The extraordinary performance of GPT-4 on diagnosing common clinical scenarios could be explained in part by the fact that these cases were previously published and may have been included in the training dataset for this LLM.
Collapse
|
4
|
Codipilly DC, Faghani S, Hagan C, Lewis J, Erickson BJ, Iyer PG. The Evolving Role of Artificial Intelligence in Gastrointestinal Histopathology: An Update. Clin Gastroenterol Hepatol 2024; 22:1170-1180. [PMID: 38154727 DOI: 10.1016/j.cgh.2023.11.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/20/2023] [Accepted: 11/21/2023] [Indexed: 12/30/2023]
Abstract
Significant advances in artificial intelligence (AI) over the past decade potentially may lead to dramatic effects on clinical practice. Digitized histology represents an area ripe for AI implementation. We describe several current needs within the world of gastrointestinal histopathology, and outline, using currently studied models, how AI potentially can address them. We also highlight pitfalls as AI makes inroads into clinical practice.
Collapse
Affiliation(s)
- D Chamil Codipilly
- Barrett's Esophagus Unit, Division of Gastroenterology and Hepatology, Mayo Clinic Rochester, Rochester, Minnesota
| | - Shahriar Faghani
- Mayo Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, Rochester, Minnesota
| | - Catherine Hagan
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | - Jason Lewis
- Department of Pathology, Mayo Clinic, Jacksonville, Florida
| | - Bradley J Erickson
- Mayo Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, Rochester, Minnesota
| | - Prasad G Iyer
- Barrett's Esophagus Unit, Division of Gastroenterology and Hepatology, Mayo Clinic Rochester, Rochester, Minnesota.
| |
Collapse
|
5
|
Meczner A, Cohen N, Qureshi A, Reza M, Sutaria S, Blount E, Bagyura Z, Malak T. Controlling Inputter Variability in Vignette Studies Assessing Web-Based Symptom Checkers: Evaluation of Current Practice and Recommendations for Isolated Accuracy Metrics. JMIR Form Res 2024; 8:e49907. [PMID: 38820578 PMCID: PMC11179013 DOI: 10.2196/49907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/10/2023] [Accepted: 04/24/2024] [Indexed: 06/02/2024] Open
Abstract
BACKGROUND The rapid growth of web-based symptom checkers (SCs) is not matched by advances in quality assurance. Currently, there are no widely accepted criteria assessing SCs' performance. Vignette studies are widely used to evaluate SCs, measuring the accuracy of outcome. Accuracy behaves as a composite metric as it is affected by a number of individual SC- and tester-dependent factors. In contrast to clinical studies, vignette studies have a small number of testers. Hence, measuring accuracy alone in vignette studies may not provide a reliable assessment of performance due to tester variability. OBJECTIVE This study aims to investigate the impact of tester variability on the accuracy of outcome of SCs, using clinical vignettes. It further aims to investigate the feasibility of measuring isolated aspects of performance. METHODS Healthily's SC was assessed using 114 vignettes by 3 groups of 3 testers who processed vignettes with different instructions: free interpretation of vignettes (free testers), specified chief complaints (partially free testers), and specified chief complaints with strict instruction for answering additional symptoms (restricted testers). κ statistics were calculated to assess agreement of top outcome condition and recommended triage. Crude and adjusted accuracy was measured against a gold standard. Adjusted accuracy was calculated using only results of consultations identical to the vignette, following a review and selection process. A feasibility study for assessing symptom comprehension of SCs was performed using different variations of 51 chief complaints across 3 SCs. RESULTS Intertester agreement of most likely condition and triage was, respectively, 0.49 and 0.51 for the free tester group, 0.66 and 0.66 for the partially free group, and 0.72 and 0.71 for the restricted group. For the restricted group, accuracy ranged from 43.9% to 57% for individual testers, averaging 50.6% (SD 5.35%). Adjusted accuracy was 56.1%. Assessing symptom comprehension was feasible for all 3 SCs. Comprehension scores ranged from 52.9% and 68%. CONCLUSIONS We demonstrated that by improving standardization of the vignette testing process, there is a significant improvement in the agreement of outcome between testers. However, significant variability remained due to uncontrollable tester-dependent factors, reflected by varying outcome accuracy. Tester-dependent factors, combined with a small number of testers, limit the reliability and generalizability of outcome accuracy when used as a composite measure in vignette studies. Measuring and reporting different aspects of SC performance in isolation provides a more reliable assessment of SC performance. We developed an adjusted accuracy measure using a review and selection process to assess data algorithm quality. In addition, we demonstrated that symptom comprehension with different input methods can be feasibly compared. Future studies reporting accuracy need to apply vignette testing standardization and isolated metrics.
Collapse
Affiliation(s)
- András Meczner
- Healthily, London, United Kingdom
- Institute for Clinical Data Management, Semmelweis University, Budapest, Hungary
| | | | | | | | | | | | - Zsolt Bagyura
- Institute for Clinical Data Management, Semmelweis University, Budapest, Hungary
| | | |
Collapse
|
6
|
Augusto Duenhas Accorsi T, Tocci Moreira F, Aires Eduardo A, Albaladejo Morbeck R, Francine Köhler K, De Amicis Lima K, Henrique Sartorato Pedrotti C. Outcome After Self-Triage App Referral in Urgent Direct-to-Consumer Telemedicine Encounter. Telemed J E Health 2024. [PMID: 38805348 DOI: 10.1089/tmj.2024.0126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024] Open
Abstract
Background: The quantification of self-triage effectiveness, guided by mobile applications, in urgent direct-to-consumer telemedicine (TM) encounters requires further investigation. The objective of this study was to evaluate the outcomes of referral guidance provided by a symptom-based self-management mobile application decision algorithm in the context of remote urgent care assessments. Methods: An observational retrospective single-center study was conducted from May 2022 to December 2023. The inclusion criteria encompassed individuals aged >18 years old, and those spontaneously seeking virtual emergency care through the EINSTEIN CONECTA application. Patients experiencing connectivity issues, preventing completion of the encounter, were excluded. The primary outcomes included the rate of patient concurrence with the algorithm's recommendation for seeking in-person emergency care and the referral rate to face-to-face assessment among cases evaluated through TM. The application's algorithm employs scientific evidence based on symptoms to recommend referrals to emergency departments (EDs). Results: Out of 88,834 patients connected to the TM Center, self-triage obviated the need for virtual physician assessment in 53,302 (60%) encounters. A total of 35,532 patients were remotely evaluated by 316 on-duty physicians, resulting in 1,125 ICD-coded diagnoses. Among these, 21,722 (61.1%) were initially advised by self-triage to visit the ED, with subsequent medical assessment leading to in-person referrals in 6,354 (29.3%) of the evaluations. Of the 13,810 patients recommended to continue with virtual care post-self-triage, 157 (1.1%) were referred for in-person assessment. Conclusions: Self-triage effectively reduced the need for physician encounters in approximately three-fifths of TM consultations. Despite being based on scientific evidence, symptom-based referral algorithms demonstrated high sensitivity but poor correlation with physician decision-making.
Collapse
|
7
|
Hammoud M, Douglas S, Darmach M, Alawneh S, Sanyal S, Kanbour Y. Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study. JMIR AI 2024; 3:e46875. [PMID: 38875676 PMCID: PMC11091811 DOI: 10.2196/46875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/15/2023] [Accepted: 03/02/2024] [Indexed: 06/16/2024]
Abstract
BACKGROUND Medical self-diagnostic tools (or symptom checkers) are becoming an integral part of digital health and our daily lives, whereby patients are increasingly using them to identify the underlying causes of their symptoms. As such, it is essential to rigorously investigate and comprehensively report the diagnostic performance of symptom checkers using standard clinical and scientific approaches. OBJECTIVE This study aims to evaluate and report the accuracies of a few known and new symptom checkers using a standard and transparent methodology, which allows the scientific community to cross-validate and reproduce the reported results, a step much needed in health informatics. METHODS We propose a 4-stage experimentation methodology that capitalizes on the standard clinical vignette approach to evaluate 6 symptom checkers. To this end, we developed and peer-reviewed 400 vignettes, each approved by at least 5 out of 7 independent and experienced primary care physicians. To establish a frame of reference and interpret the results of symptom checkers accordingly, we further compared the best-performing symptom checker against 3 primary care physicians with an average experience of 16.6 (SD 9.42) years. To measure accuracy, we used 7 standard metrics, including M1 as a measure of a symptom checker's or a physician's ability to return a vignette's main diagnosis at the top of their differential list, F1-score as a trade-off measure between recall and precision, and Normalized Discounted Cumulative Gain (NDCG) as a measure of a differential list's ranking quality, among others. RESULTS The diagnostic accuracies of the 6 tested symptom checkers vary significantly. For instance, the differences in the M1, F1-score, and NDCG results between the best-performing and worst-performing symptom checkers or ranges were 65.3%, 39.2%, and 74.2%, respectively. The same was observed among the participating human physicians, whereby the M1, F1-score, and NDCG ranges were 22.8%, 15.3%, and 21.3%, respectively. When compared against each other, physicians outperformed the best-performing symptom checker by an average of 1.2% using F1-score, whereas the best-performing symptom checker outperformed physicians by averages of 10.2% and 25.1% using M1 and NDCG, respectively. CONCLUSIONS The performance variation between symptom checkers is substantial, suggesting that symptom checkers cannot be treated as a single entity. On a different note, the best-performing symptom checker was an artificial intelligence (AI)-based one, shedding light on the promise of AI in improving the diagnostic capabilities of symptom checkers, especially as AI keeps advancing exponentially.
Collapse
|
8
|
Fukuzawa F, Yanagita Y, Yokokawa D, Uchida S, Yamashita S, Li Y, Shikino K, Tsukamoto T, Noda K, Uehara T, Ikusaka M. Importance of Patient History in Artificial Intelligence-Assisted Medical Diagnosis: Comparison Study. JMIR MEDICAL EDUCATION 2024; 10:e52674. [PMID: 38602313 PMCID: PMC11024399 DOI: 10.2196/52674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/31/2024] [Accepted: 02/15/2024] [Indexed: 04/12/2024]
Abstract
Background Medical history contributes approximately 80% to a diagnosis, although physical examinations and laboratory investigations increase a physician's confidence in the medical diagnosis. The concept of artificial intelligence (AI) was first proposed more than 70 years ago. Recently, its role in various fields of medicine has grown remarkably. However, no studies have evaluated the importance of patient history in AI-assisted medical diagnosis. Objective This study explored the contribution of patient history to AI-assisted medical diagnoses and assessed the accuracy of ChatGPT in reaching a clinical diagnosis based on the medical history provided. Methods Using clinical vignettes of 30 cases identified in The BMJ, we evaluated the accuracy of diagnoses generated by ChatGPT. We compared the diagnoses made by ChatGPT based solely on medical history with the correct diagnoses. We also compared the diagnoses made by ChatGPT after incorporating additional physical examination findings and laboratory data alongside history with the correct diagnoses. Results ChatGPT accurately diagnosed 76.6% (23/30) of the cases with only the medical history, consistent with previous research targeting physicians. We also found that this rate was 93.3% (28/30) when additional information was included. Conclusions Although adding additional information improves diagnostic accuracy, patient history remains a significant factor in AI-assisted medical diagnosis. Thus, when using AI in medical diagnosis, it is crucial to include pertinent and correct patient histories for an accurate diagnosis. Our findings emphasize the continued significance of patient history in clinical diagnoses in this age and highlight the need for its integration into AI-assisted medical diagnosis systems.
Collapse
Affiliation(s)
- Fumitoshi Fukuzawa
- Department of General Medicine, Chiba University Hospital, Chiba-shi, Japan
| | - Yasutaka Yanagita
- Department of General Medicine, Chiba University Hospital, Chiba-shi, Japan
| | - Daiki Yokokawa
- Department of General Medicine, Chiba University Hospital, Chiba-shi, Japan
| | - Shun Uchida
- Uchida Internal Medicine Clinic, Saitama-shi, Japan
| | - Shiho Yamashita
- Department of General Medicine, Chiba University Hospital, Chiba-shi, Japan
| | - Yu Li
- Department of General Medicine, Chiba University Hospital, Chiba-shi, Japan
| | - Kiyoshi Shikino
- Department of General Medicine, Chiba University Hospital, Chiba-shi, Japan
| | - Tomoko Tsukamoto
- Department of General Medicine, Chiba University Hospital, Chiba-shi, Japan
| | - Kazutaka Noda
- Department of General Medicine, Chiba University Hospital, Chiba-shi, Japan
| | - Takanori Uehara
- Department of General Medicine, Chiba University Hospital, Chiba-shi, Japan
| | - Masatomi Ikusaka
- Department of General Medicine, Chiba University Hospital, Chiba-shi, Japan
| |
Collapse
|
9
|
Savolainen K, Kujala S. Testing Two Online Symptom Checkers With Vulnerable Groups: Usability Study to Improve Cognitive Accessibility of eHealth Services. JMIR Hum Factors 2024; 11:e45275. [PMID: 38457214 PMCID: PMC10960212 DOI: 10.2196/45275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/08/2023] [Accepted: 02/03/2024] [Indexed: 03/09/2024] Open
Abstract
BACKGROUND The popularity of eHealth services has surged significantly, underscoring the importance of ensuring their usability and accessibility for users with diverse needs, characteristics, and capabilities. These services can pose cognitive demands, especially for individuals who are unwell, fatigued, or experiencing distress. Additionally, numerous potentially vulnerable groups, including older adults, are susceptible to digital exclusion and may encounter cognitive limitations related to perception, attention, memory, and language comprehension. Regrettably, many studies overlook the preferences and needs of user groups likely to encounter challenges associated with these cognitive aspects. OBJECTIVE This study primarily aims to gain a deeper understanding of cognitive accessibility in the practical context of eHealth services. Additionally, we aimed to identify the specific challenges that vulnerable groups encounter when using eHealth services and determine key considerations for testing these services with such groups. METHODS As a case study of eHealth services, we conducted qualitative usability testing on 2 online symptom checkers used in Finnish public primary care. A total of 13 participants from 3 distinct groups participated in the study: older adults, individuals with mild intellectual disabilities, and nonnative Finnish speakers. The primary research methods used were the thinking-aloud method, questionnaires, and semistructured interviews. RESULTS We found that potentially vulnerable groups encountered numerous issues with the tested services, with similar problems observed across all 3 groups. Specifically, clarity and the use of terminology posed significant challenges. The services overwhelmed users with excessive information and choices, while the terminology consisted of numerous complex medical terms that were difficult to understand. When conducting tests with vulnerable groups, it is crucial to carefully plan the sessions to avoid being overly lengthy, as these users often require more time to complete tasks. Additionally, testing with vulnerable groups proved to be quite efficient, with results likely to benefit a wider audience as well. CONCLUSIONS Based on the findings of this study, it is evident that older adults, individuals with mild intellectual disability, and nonnative speakers may encounter cognitive challenges when using eHealth services, which can impede or slow down their use and make the services more difficult to navigate. In the worst-case scenario, these challenges may lead to errors in using the services. We recommend expanding the scope of testing to include a broader range of eHealth services with vulnerable groups, incorporating users with diverse characteristics and capabilities who are likely to encounter difficulties in cognitive accessibility.
Collapse
Affiliation(s)
- Kaisa Savolainen
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Sari Kujala
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| |
Collapse
|
10
|
Chien S, Miller G, Huang I, Cunningham DA, Carson D, Gall LS, Khan KS. Quality assessment of online patient information on upper gastrointestinal endoscopy using the modified Ensuring Quality Information for Patients tool. Ann R Coll Surg Engl 2024. [PMID: 38376380 DOI: 10.1308/rcsann.2022.0078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024] Open
Abstract
INTRODUCTION Websites and online resources are increasingly becoming patients' main source of healthcare information. It is paramount that high quality information is available online to enhance patient education and improve clinical outcomes. Upper gastrointestinal (UGI) endoscopy is the gold standard investigation for UGI symptoms and yet little is known regarding the quality of patient orientated websites. The aim of this study was to assess the quality of online patient information on UGI endoscopy using the modified Ensuring Quality Information for Patients (EQIP) tool. METHODS Ten search terms were employed to conduct a systematic review. for each term, the top 100 websites identified via a Google search were assessed using the modified EQIP tool. High scoring websites underwent further analysis. Websites intended for professional use by clinicians as well as those containing video or marketing content were excluded. FINDINGS A total of 378 websites were eligible for analysis. The median modified EQIP score for UGI endoscopy was 18/36 (interquartile range: 14-21). The median EQIP scores for the content, identification and structure domains were 8/18, 1/6 and 9/12 respectively. Higher modified EQIP scores were obtained for websites produced by government departments and National Health Service hospitals (p=0.007). Complication rates were documented in only a fifth (20.4%) of websites. High scoring websites were significantly more likely to provide balanced information on risks and benefits (94.6% vs 34.4%, p<0.001). CONCLUSIONS There is an immediate need to improve the quality of online patient information regarding UGI endoscopy. The currently available resources provide minimal information on the risks associated with the procedure, potentially hindering patients' ability to make informed healthcare decisions.
Collapse
Affiliation(s)
- S Chien
- NHS Greater Glasgow and Clyde, UK
- University of Glasgow, UK
| | | | - I Huang
- NHS Greater Glasgow and Clyde, UK
| | | | - D Carson
- NHS Greater Glasgow and Clyde, UK
| | - L S Gall
- NHS Greater Glasgow and Clyde, UK
| | - K S Khan
- University of Glasgow, UK
- NHS Lanarkshire, UK
| |
Collapse
|
11
|
Müller R, Klemmt M, Koch R, Ehni HJ, Henking T, Langmann E, Wiesing U, Ranisch R. "That's just Future Medicine" - a qualitative study on users' experiences of symptom checker apps. BMC Med Ethics 2024; 25:17. [PMID: 38365749 PMCID: PMC10874001 DOI: 10.1186/s12910-024-01011-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 02/06/2024] [Indexed: 02/18/2024] Open
Abstract
BACKGROUND Symptom checker apps (SCAs) are mobile or online applications for lay people that usually have two main functions: symptom analysis and recommendations. SCAs ask users questions about their symptoms via a chatbot, give a list with possible causes, and provide a recommendation, such as seeing a physician. However, it is unclear whether the actual performance of a SCA corresponds to the users' experiences. This qualitative study investigates the subjective perspectives of SCA users to close the empirical gap identified in the literature and answers the following main research question: How do individuals (healthy users and patients) experience the usage of SCA, including their attitudes, expectations, motivations, and concerns regarding their SCA use? METHODS A qualitative interview study was chosen to clarify the relatively unknown experience of SCA use. Semi-structured qualitative interviews with SCA users were carried out by two researchers in tandem via video call. Qualitative content analysis was selected as methodology for the data analysis. RESULTS Fifteen interviews with SCA users were conducted and seven main categories identified: (1) Attitudes towards findings and recommendations, (2) Communication, (3) Contact with physicians, (4) Expectations (prior to use), (5) Motivations, (6) Risks, and (7) SCA-use for others. CONCLUSIONS The aspects identified in the analysis emphasise the specific perspective of SCA users and, at the same time, the immense scope of different experiences. Moreover, the study reveals ethical issues, such as relational aspects, that are often overlooked in debates on mHealth. Both empirical and ethical research is more needed, as the awareness of the subjective experience of those affected is an essential component in the responsible development and implementation of health apps such as SCA. TRIAL REGISTRATION German Clinical Trials Register (DRKS): DRKS00022465. 07/08/2020.
Collapse
Affiliation(s)
- Regina Müller
- Institute of Philosophy, University Bremen, Bremen, Germany.
| | - Malte Klemmt
- Institute of General Practice and Palliative Care, Hannover Medical School, Hannover, Germany
| | - Roland Koch
- Institute of General Practice and Interprofessional Care, University Hospital Tübingen, Tübingen, Germany
| | - Hans-Jörg Ehni
- Institute of Ethics and History of Medicine, University Tübingen, Tübingen, Germany
| | - Tanja Henking
- Institute of Applied Social Science, University of Applied Science Würzburg- Schweinfurt, Würzburg, Germany
| | - Elisabeth Langmann
- Institute of Ethics and History of Medicine, University Tübingen, Tübingen, Germany
| | - Urban Wiesing
- Institute of Ethics and History of Medicine, University Tübingen, Tübingen, Germany
| | - Robert Ranisch
- Faculty of Health Science Brandenburg, University of Potsdam, Potsdam, Germany
| |
Collapse
|
12
|
Xue J, Zhang B, Zhao Y, Zhang Q, Zheng C, Jiang J, Li H, Liu N, Li Z, Fu W, Peng Y, Logan J, Zhang J, Xiang X. Evaluation of the Current State of Chatbots for Digital Health: Scoping Review. J Med Internet Res 2023; 25:e47217. [PMID: 38113097 PMCID: PMC10762606 DOI: 10.2196/47217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/15/2023] [Accepted: 11/24/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND Chatbots have become ubiquitous in our daily lives, enabling natural language conversations with users through various modes of communication. Chatbots have the potential to play a significant role in promoting health and well-being. As the number of studies and available products related to chatbots continues to rise, there is a critical need to assess product features to enhance the design of chatbots that effectively promote health and behavioral change. OBJECTIVE This scoping review aims to provide a comprehensive assessment of the current state of health-related chatbots, including the chatbots' characteristics and features, user backgrounds, communication models, relational building capacity, personalization, interaction, responses to suicidal thoughts, and users' in-app experiences during chatbot use. Through this analysis, we seek to identify gaps in the current research, guide future directions, and enhance the design of health-focused chatbots. METHODS Following the scoping review methodology by Arksey and O'Malley and guided by the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist, this study used a two-pronged approach to identify relevant chatbots: (1) searching the iOS and Android App Stores and (2) reviewing scientific literature through a search strategy designed by a librarian. Overall, 36 chatbots were selected based on predefined criteria from both sources. These chatbots were systematically evaluated using a comprehensive framework developed for this study, including chatbot characteristics, user backgrounds, building relational capacity, personalization, interaction models, responses to critical situations, and user experiences. Ten coauthors were responsible for downloading and testing the chatbots, coding their features, and evaluating their performance in simulated conversations. The testing of all chatbot apps was limited to their free-to-use features. RESULTS This review provides an overview of the diversity of health-related chatbots, encompassing categories such as mental health support, physical activity promotion, and behavior change interventions. Chatbots use text, animations, speech, images, and emojis for communication. The findings highlight variations in conversational capabilities, including empathy, humor, and personalization. Notably, concerns regarding safety, particularly in addressing suicidal thoughts, were evident. Approximately 44% (16/36) of the chatbots effectively addressed suicidal thoughts. User experiences and behavioral outcomes demonstrated the potential of chatbots in health interventions, but evidence remains limited. CONCLUSIONS This scoping review underscores the significance of chatbots in health-related applications and offers insights into their features, functionalities, and user experiences. This study contributes to advancing the understanding of chatbots' role in digital health interventions, thus paving the way for more effective and user-centric health promotion strategies. This study informs future research directions, emphasizing the need for rigorous randomized control trials, standardized evaluation metrics, and user-centered design to unlock the full potential of chatbots in enhancing health and well-being. Future research should focus on addressing limitations, exploring real-world user experiences, and implementing robust data security and privacy measures.
Collapse
Affiliation(s)
- Jia Xue
- Factor Inwentash Faculty of Social Work, University of Toronto, Toronto, ON, Canada
- Faculty of Information, University of Toronto, Toronto, ON, Canada
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Bolun Zhang
- Faculty of Information, University of Toronto, Toronto, ON, Canada
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Yaxi Zhao
- Faculty of Information, University of Toronto, Toronto, ON, Canada
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Qiaoru Zhang
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
- Faculty of Arts and Science, University of Toronto, Toronto, ON, Canada
| | - Chengda Zheng
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Jielin Jiang
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Hanjia Li
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Nian Liu
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Ziqian Li
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Weiying Fu
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Yingdong Peng
- Artificial Intelligence for Justice Lab, University of Toronto, Toronto, ON, Canada
| | - Judith Logan
- John P Robarts Library, University of Toronto, Toronto, ON, Canada
| | - Jingwen Zhang
- Department of Communication, University of California Davis, Davis, CA, United States
| | - Xiaoling Xiang
- School of Social Work, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
13
|
Benoit JR, Hartling L, Scott SD. Bridging evidence-to-care gaps with mHealth: Designing a symptom checker for parents accessing knowledge translation resources on acute children's illnesses in a smartphone application. PEC INNOVATION 2023; 2:100152. [PMID: 37214490 PMCID: PMC10194162 DOI: 10.1016/j.pecinn.2023.100152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 03/06/2023] [Accepted: 03/28/2023] [Indexed: 05/24/2023]
Abstract
Background Smartphone applications offer a novel platform for delivering health information to parents. This study created and evaluated an app-based symptom checker that recommends educational tools to parents based on their child's symptoms. Methods Symptoms extracted from 23 knowledge translation (KT) tools for 10 children's illnesses comprised a set of plain-language symptoms. The symptom checker works by producing confusion matrices evaluating a child's reported symptoms against possible illnesses, comparing precision scores to examine how well each illness matched reported symptoms, and ordering possible illnesses by performance score. Performance was evaluated by extracting symptoms from 8 clinical vignettes, and examining correct first-try matches. Results We created a final list of 54 plain-language symptoms. Visualizations of the symptom set creation process and logic mapping are presented, as well as images of the working symptom checker. The symptom checker matched 100% (8/8) of tested clinical vignettes to the appropriate illness resource. Discussion Symptom checkers are a potentially useful tool to integrate into apps that parents use for their children's health. The design of these systems has the potential to change parents' relationship with technology, affecting both their adoption and acceptance of symptom checkers. Our design choices contribute to addressing current barriers to the adoption of symptom checkers, reducing functional, critical, and interactive literacy requirements for parents.
Collapse
Affiliation(s)
- James R.A. Benoit
- Department of Pediatrics, University of Alberta, Edmonton Clinic Health Academy, 11405-87 Avenue, Edmonton, Alberta T6G 1C9, Canada
- Faculty of Nursing, University of Alberta, Edmonton Clinic Health Academy, 11405-87 Avenue, Edmonton, Alberta T6G 1C9, Canada
| | - Lisa Hartling
- Department of Pediatrics, University of Alberta, Edmonton Clinic Health Academy, 11405-87 Avenue, Edmonton, Alberta T6G 1C9, Canada
| | - Shannon D. Scott
- Faculty of Nursing, University of Alberta, Edmonton Clinic Health Academy, 11405-87 Avenue, Edmonton, Alberta T6G 1C9, Canada
| |
Collapse
|
14
|
Bushuven S, Bentele M, Bentele S, Gerber B, Bansbach J, Ganter J, Trifunovic-Koenig M, Ranisch R. "ChatGPT, Can You Help Me Save My Child's Life?" - Diagnostic Accuracy and Supportive Capabilities to Lay Rescuers by ChatGPT in Prehospital Basic Life Support and Paediatric Advanced Life Support Cases - An In-silico Analysis. J Med Syst 2023; 47:123. [PMID: 37987870 PMCID: PMC10663183 DOI: 10.1007/s10916-023-02019-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 11/13/2023] [Indexed: 11/22/2023]
Abstract
BACKGROUND Paediatric emergencies are challenging for healthcare workers, first aiders, and parents waiting for emergency medical services to arrive. With the expected rise of virtual assistants, people will likely seek help from such digital AI tools, especially in regions lacking emergency medical services. Large Language Models like ChatGPT proved effective in providing health-related information and are competent in medical exams but are questioned regarding patient safety. Currently, there is no information on ChatGPT's performance in supporting parents in paediatric emergencies requiring help from emergency medical services. This study aimed to test 20 paediatric and two basic life support case vignettes for ChatGPT and GPT-4 performance and safety in children. METHODS We provided the cases three times each to two models, ChatGPT and GPT-4, and assessed the diagnostic accuracy, emergency call advice, and the validity of advice given to parents. RESULTS Both models recognized the emergency in the cases, except for septic shock and pulmonary embolism, and identified the correct diagnosis in 94%. However, ChatGPT/GPT-4 reliably advised to call emergency services only in 12 of 22 cases (54%), gave correct first aid instructions in 9 cases (45%) and incorrectly advised advanced life support techniques to parents in 3 of 22 cases (13.6%). CONCLUSION Considering these results of the recent ChatGPT versions, the validity, reliability and thus safety of ChatGPT/GPT-4 as an emergency support tool is questionable. However, whether humans would perform better in the same situation is uncertain. Moreover, other studies have shown that human emergency call operators are also inaccurate, partly with worse performance than ChatGPT/GPT-4 in our study. However, one of the main limitations of the study is that we used prototypical cases, and the management may differ from urban to rural areas and between different countries, indicating the need for further evaluation of the context sensitivity and adaptability of the model. Nevertheless, ChatGPT and the new versions under development may be promising tools for assisting lay first responders, operators, and professionals in diagnosing a paediatric emergency. TRIAL REGISTRATION Not applicable.
Collapse
Affiliation(s)
- Stefan Bushuven
- Training Center for Emergency Medicine (NOTIS e.V), Breite Strasse 7, Engen, 78234, Germany.
- Department of Anesthesiology and Critical Care, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
- Institute for Medical Education, University Hospital, LMU Munich, Munich, Germany.
| | - Michael Bentele
- Training Center for Emergency Medicine (NOTIS e.V), Breite Strasse 7, Engen, 78234, Germany
| | - Stefanie Bentele
- Training Center for Emergency Medicine (NOTIS e.V), Breite Strasse 7, Engen, 78234, Germany
| | - Bianka Gerber
- Training Center for Emergency Medicine (NOTIS e.V), Breite Strasse 7, Engen, 78234, Germany
| | - Joachim Bansbach
- Department of Anesthesiology and Critical Care, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Julian Ganter
- Department of Anesthesiology and Critical Care, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | | | - Robert Ranisch
- Faculty for Health Sciences Brandenburg, University of Potsdam, Potsdam, Germany
| |
Collapse
|
15
|
Ito N, Kadomatsu S, Fujisawa M, Fukaguchi K, Ishizawa R, Kanda N, Kasugai D, Nakajima M, Goto T, Tsugawa Y. The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study. JMIR MEDICAL EDUCATION 2023; 9:e47532. [PMID: 37917120 PMCID: PMC10654908 DOI: 10.2196/47532] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 07/07/2023] [Accepted: 09/05/2023] [Indexed: 11/03/2023]
Abstract
BACKGROUND Whether GPT-4, the conversational artificial intelligence, can accurately diagnose and triage health conditions and whether it presents racial and ethnic biases in its decisions remain unclear. OBJECTIVE We aim to assess the accuracy of GPT-4 in the diagnosis and triage of health conditions and whether its performance varies by patient race and ethnicity. METHODS We compared the performance of GPT-4 and physicians, using 45 typical clinical vignettes, each with a correct diagnosis and triage level, in February and March 2023. For each of the 45 clinical vignettes, GPT-4 and 3 board-certified physicians provided the most likely primary diagnosis and triage level (emergency, nonemergency, or self-care). Independent reviewers evaluated the diagnoses as "correct" or "incorrect." Physician diagnosis was defined as the consensus of the 3 physicians. We evaluated whether the performance of GPT-4 varies by patient race and ethnicity, by adding the information on patient race and ethnicity to the clinical vignettes. RESULTS The accuracy of diagnosis was comparable between GPT-4 and physicians (the percentage of correct diagnosis was 97.8% (44/45; 95% CI 88.2%-99.9%) for GPT-4 and 91.1% (41/45; 95% CI 78.8%-97.5%) for physicians; P=.38). GPT-4 provided appropriate reasoning for 97.8% (44/45) of the vignettes. The appropriateness of triage was comparable between GPT-4 and physicians (GPT-4: 30/45, 66.7%; 95% CI 51.0%-80.0%; physicians: 30/45, 66.7%; 95% CI 51.0%-80.0%; P=.99). The performance of GPT-4 in diagnosing health conditions did not vary among different races and ethnicities (Black, White, Asian, and Hispanic), with an accuracy of 100% (95% CI 78.2%-100%). P values, compared to the GPT-4 output without incorporating race and ethnicity information, were all .99. The accuracy of triage was not significantly different even if patients' race and ethnicity information was added. The accuracy of triage was 62.2% (95% CI 46.5%-76.2%; P=.50) for Black patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for White patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for Asian patients, and 62.2% (95% CI 46.5%-76.2%; P=.69) for Hispanic patients. P values were calculated by comparing the outputs with and without conditioning on race and ethnicity. CONCLUSIONS GPT-4's ability to diagnose and triage typical clinical vignettes was comparable to that of board-certified physicians. The performance of GPT-4 did not vary by patient race and ethnicity. These findings should be informative for health systems looking to introduce conversational artificial intelligence to improve the efficiency of patient diagnosis and triage.
Collapse
Affiliation(s)
- Naoki Ito
- TXP Medical Co Ltd, Tokyo, Japan
- Faculty of Medicine, The University of Tokyo, Tokyo, Japan
| | - Sakina Kadomatsu
- TXP Medical Co Ltd, Tokyo, Japan
- Faculty of Medicine, International University of Health and Welfare, Chiba, Japan
| | - Mineto Fujisawa
- TXP Medical Co Ltd, Tokyo, Japan
- Faculty of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kiyomitsu Fukaguchi
- TXP Medical Co Ltd, Tokyo, Japan
- Department of Emergency Medicine, Shonan Kamakura General Hospital, Kanagawa, Japan
| | - Ryo Ishizawa
- TXP Medical Co Ltd, Tokyo, Japan
- Department of Emergency and Critical Care Medicine, Tokyo Medical Center National Hospital Organization, Tokyo, Japan
| | - Naoki Kanda
- TXP Medical Co Ltd, Tokyo, Japan
- Division of General Internal Medicine, Jichi Medical University Hospital, Tochigi, Japan
| | - Daisuke Kasugai
- TXP Medical Co Ltd, Tokyo, Japan
- Department of Emergency and Critical Care Medicine, Nagoya University Graduate School of Medicine, Aichi, Japan
| | - Mikio Nakajima
- TXP Medical Co Ltd, Tokyo, Japan
- Emergency Life-Saving Technique Academy of Tokyo Foundation for Ambulance Service Development, Tokyo, Japan
| | | | - Yusuke Tsugawa
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, The University of California, Los Angeles, Los Angeles, CA, United States
- Department of Health Policy and Management, UCLA Fielding School of Public Health, Los Angeles, CA, United States
| |
Collapse
|
16
|
Marcin T, Lüthi A, Graf RR, Krummrey G, Schauber SK, Breakey N, Hautz WE, Hautz SC. Is language an issue? Accuracy of the German computerized diagnostic decision support system ISABEL and cross-validation with the English counterpart. Diagnosis (Berl) 2023; 10:398-405. [PMID: 37480571 DOI: 10.1515/dx-2023-0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 06/16/2023] [Indexed: 07/24/2023]
Abstract
OBJECTIVES Existing computerized diagnostic decision support tools (CDDS) accurately return possible differential diagnoses (DDx) based on the clinical information provided. The German versions of the CDDS tools for clinicians (Isabel Pro) and patients (Isabel Symptom Checker) from ISABEL Healthcare have not been validated yet. METHODS We entered clinical features of 50 patient vignettes taken from an emergency medical text book and 50 real cases with a confirmed diagnosis derived from the electronic health record (EHR) of a large academic Swiss emergency room into the German versions of Isabel Pro and Isabel Symptom Checker. We analysed the proportion of DDx lists that included the correct diagnosis. RESULTS Isabel Pro and Symptom Checker provided the correct diagnosis in 82 and 71 % of the cases, respectively. Overall, the correct diagnosis was ranked in 71 , 61 and 37 % of the cases within the top 20, 10 and 3 of the provided DDx when using Isabel Pro. In general, accuracy was higher with vignettes than ED cases, i.e. listed the correct diagnosis more often (non-significant) and ranked the diagnosis significantly more often within the top 20, 10 and 3. On average, 38 ± 4.5 DDx were provided by Isabel Pro and Symptom Checker. CONCLUSIONS The German versions of Isabel achieved a somewhat lower accuracy compared to previous studies of the English version. The accuracy decreases substantially when the position in the suggested DDx list is taken into account. Whether Isabel Pro is accurate enough to improve diagnostic quality in clinical ED routine needs further investigation.
Collapse
Affiliation(s)
- Thimo Marcin
- Department of Emergency Medicine, Inselspital University Hospital Bern, Bern, Switzerland
| | - Ailin Lüthi
- Department of Emergency Medicine, Inselspital University Hospital Bern, Bern, Switzerland
- Faculty of Medicine, University of Bern, Bern, Switzerland
| | - Ronny R Graf
- Department of Emergency Medicine, Inselspital University Hospital Bern, Bern, Switzerland
- Faculty of Medicine, University of Bern, Bern, Switzerland
| | - Gert Krummrey
- Department of Emergency Medicine, Inselspital University Hospital Bern, Bern, Switzerland
| | - Stefan K Schauber
- Centre for Educational Measurement, Faculty of Educational Sciences, University of Oslo, Oslo, Norway
- Centre for Health Sciences Education, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Neal Breakey
- Department of Medicine, Spital Emmental, Burgdorf, Switzerland
| | - Wolf E Hautz
- Department of Emergency Medicine, Inselspital University Hospital Bern, Bern, Switzerland
| | - Stefanie C Hautz
- Department of Emergency Medicine, Inselspital University Hospital Bern, Bern, Switzerland
| |
Collapse
|
17
|
Fraser H, Crossland D, Bacher I, Ranney M, Madsen T, Hilliard R. Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study. JMIR Mhealth Uhealth 2023; 11:e49995. [PMID: 37788063 PMCID: PMC10582809 DOI: 10.2196/49995] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/17/2023] [Accepted: 08/25/2023] [Indexed: 10/04/2023] Open
Abstract
BACKGROUND Diagnosis is a core component of effective health care, but misdiagnosis is common and can put patients at risk. Diagnostic decision support systems can play a role in improving diagnosis by physicians and other health care workers. Symptom checkers (SCs) have been designed to improve diagnosis and triage (ie, which level of care to seek) by patients. OBJECTIVE The aim of this study was to evaluate the performance of the new large language model ChatGPT (versions 3.5 and 4.0), the widely used WebMD SC, and an SC developed by Ada Health in the diagnosis and triage of patients with urgent or emergent clinical problems compared with the final emergency department (ED) diagnoses and physician reviews. METHODS We used previously collected, deidentified, self-report data from 40 patients presenting to an ED for care who used the Ada SC to record their symptoms prior to seeing the ED physician. Deidentified data were entered into ChatGPT versions 3.5 and 4.0 and WebMD by a research assistant blinded to diagnoses and triage. Diagnoses from all 4 systems were compared with the previously abstracted final diagnoses in the ED as well as with diagnoses and triage recommendations from three independent board-certified ED physicians who had blindly reviewed the self-report clinical data from Ada. Diagnostic accuracy was calculated as the proportion of the diagnoses from ChatGPT, Ada SC, WebMD SC, and the independent physicians that matched at least one ED diagnosis (stratified as top 1 or top 3). Triage accuracy was calculated as the number of recommendations from ChatGPT, WebMD, or Ada that agreed with at least 2 of the independent physicians or were rated "unsafe" or "too cautious." RESULTS Overall, 30 and 37 cases had sufficient data for diagnostic and triage analysis, respectively. The rate of top-1 diagnosis matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 9 (30%), 12 (40%), 10 (33%), and 12 (40%), respectively, with a mean rate of 47% for the physicians. The rate of top-3 diagnostic matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 19 (63%), 19 (63%), 15 (50%), and 17 (57%), respectively, with a mean rate of 69% for physicians. The distribution of triage results for Ada was 62% (n=23) agree, 14% unsafe (n=5), and 24% (n=9) too cautious; that for ChatGPT 3.5 was 59% (n=22) agree, 41% (n=15) unsafe, and 0% (n=0) too cautious; that for ChatGPT 4.0 was 76% (n=28) agree, 22% (n=8) unsafe, and 3% (n=1) too cautious; and that for WebMD was 70% (n=26) agree, 19% (n=7) unsafe, and 11% (n=4) too cautious. The unsafe triage rate for ChatGPT 3.5 (41%) was significantly higher (P=.009) than that of Ada (14%). CONCLUSIONS ChatGPT 3.5 had high diagnostic accuracy but a high unsafe triage rate. ChatGPT 4.0 had the poorest diagnostic accuracy, but a lower unsafe triage rate and the highest triage agreement with the physicians. The Ada and WebMD SCs performed better overall than ChatGPT. Unsupervised patient use of ChatGPT for diagnosis and triage is not recommended without improvements to triage accuracy and extensive clinical evaluation.
Collapse
Affiliation(s)
- Hamish Fraser
- Brown Center for Biomedical Informatics, The Warren Alpert Medical School of Brown University, Providence, RI, United States
- Department of Health Services, Policy and Practice, Brown University School of Public Health, Providence, RI, United States
| | - Daven Crossland
- Brown Center for Biomedical Informatics, The Warren Alpert Medical School of Brown University, Providence, RI, United States
- Department of Epidemiology, Brown University School of Public Health, Providence, RI, United States
| | - Ian Bacher
- Brown Center for Biomedical Informatics, The Warren Alpert Medical School of Brown University, Providence, RI, United States
| | - Megan Ranney
- School of Public Health, Yale University, New Haven, CT, United States
| | - Tracy Madsen
- Department of Epidemiology, Brown University School of Public Health, Providence, RI, United States
- Department of Emergency Medicine, The Warren Alpert Medical School of Brown University, Providence, RI, United States
| | - Ross Hilliard
- Department of Internal Medicine, Maine Medical Center, Portland, ME, United States
| |
Collapse
|
18
|
Kuroiwa T, Sarcon A, Ibara T, Yamada E, Yamamoto A, Tsukamoto K, Fujita K. The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study. J Med Internet Res 2023; 25:e47621. [PMID: 37713254 PMCID: PMC10541638 DOI: 10.2196/47621] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/17/2023] [Accepted: 08/17/2023] [Indexed: 09/16/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT's accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. OBJECTIVE The aim of this study was to evaluate ChatGPT's ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. METHODS Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. RESULTS The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, -0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases "essential," "recommended," "best," and "important" were used. Specifically, "essential" occurred in 4 out of 125, "recommended" in 12 out of 125, "best" in 6 out of 125, and "important" in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. CONCLUSIONS The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study.
Collapse
Affiliation(s)
- Tomoyuki Kuroiwa
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
- Division of Orthopedic Surgery Research, Mayo Clinic, Rochester, MN, United States
| | - Aida Sarcon
- Department of Surgery, Mayo Clinic, Rochester, MN, United States
| | - Takuya Ibara
- Department of Functional Joint Anatomy, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Eriku Yamada
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Akiko Yamamoto
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Kazuya Tsukamoto
- Department of Orthopaedic and Spinal Surgery, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Koji Fujita
- Department of Functional Joint Anatomy, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
- Division of Medical Design Innovations, Open Innovation Center, Institute of Research Innovation, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
19
|
Wiedermann CJ, Mahlknecht A, Piccoliori G, Engl A. Redesigning Primary Care: The Emergence of Artificial-Intelligence-Driven Symptom Diagnostic Tools. J Pers Med 2023; 13:1379. [PMID: 37763147 PMCID: PMC10532810 DOI: 10.3390/jpm13091379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 09/13/2023] [Accepted: 09/14/2023] [Indexed: 09/29/2023] Open
Abstract
Modern healthcare is facing a juxtaposition of increasing patient demands owing to an aging population and a decreasing general practitioner workforce, leading to strained access to primary care. The coronavirus disease 2019 pandemic has emphasized the potential for alternative consultation methods, highlighting opportunities to minimize unnecessary care. This article discusses the role of artificial-intelligence-driven symptom checkers, particularly their efficiency, utility, and challenges in primary care. Based on a study conducted in Italian general practices, insights from both physicians and patients were gathered regarding this emergent technology, highlighting differences in perceived utility, user satisfaction, and potential challenges. While symptom checkers are seen as potential tools for addressing healthcare challenges, concerns regarding their accuracy and the potential for misdiagnosis persist. Patients generally viewed them positively, valuing their ease of use and the empowerment they provide in managing health. However, some general practitioners perceive these tools as challenges to their expertise. This article proposes that artificial-intelligence-based symptom checkers can optimize medical-history taking for the benefit of both general practitioners and patients, with potential enhancements in complex diagnostic tasks rather than routine diagnoses. It underscores the importance of carefully integrating digital innovations while preserving the essential human touch in healthcare. Symptom checkers offer promising solutions; ensuring their accuracy, reliability, and effective integration into primary care requires rigorous research, clinical guidance, and an understanding of varied user perceptions. Collaboration among technologists, clinicians, and patients is paramount for the successful evolution of digital tools in healthcare.
Collapse
Affiliation(s)
- Christian J. Wiedermann
- Institute of General Practice and Public Health, Claudiana—College of Health Professions, 39100 Bolzano, Italy
- Department of Public Health, Medical Decision Making and HTA, University of Health Sciences, Medical Informatics and Technology-Tyrol, 6060 Hall, Austria
| | - Angelika Mahlknecht
- Institute of General Practice and Public Health, Claudiana—College of Health Professions, 39100 Bolzano, Italy
| | - Giuliano Piccoliori
- Institute of General Practice and Public Health, Claudiana—College of Health Professions, 39100 Bolzano, Italy
| | - Adolf Engl
- Institute of General Practice and Public Health, Claudiana—College of Health Professions, 39100 Bolzano, Italy
| |
Collapse
|
20
|
Mahlknecht A, Engl A, Piccoliori G, Wiedermann CJ. Supporting primary care through symptom checking artificial intelligence: a study of patient and physician attitudes in Italian general practice. BMC PRIMARY CARE 2023; 24:174. [PMID: 37661285 PMCID: PMC10476397 DOI: 10.1186/s12875-023-02143-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 08/29/2023] [Indexed: 09/05/2023]
Abstract
BACKGROUND Rapid advancements in artificial intelligence (AI) have led to the adoption of AI-driven symptom checkers in primary care. This study aimed to evaluate both patients' and physicians' attitudes towards these tools in Italian general practice settings, focusing on their perceived utility, user satisfaction, and potential challenges. METHODS This feasibility study involved ten general practitioners (GPs) and patients visiting GP offices. The patients used a chatbot-based symptom checker before their medical visit and conducted anamnestic screening for COVID-19 and a medical history algorithm concerning the current medical problem. The entered data were forwarded to the GP as medical history aid. After the medical visit, both physicians and patients evaluated their respective symptoms. Additionally, physicians performed a final overall evaluation of the symptom checker after the conclusion of the practice phase. RESULTS Most patients did not use symptom checkers. Overall, 49% of patients and 27% of physicians reported being rather or very satisfied with the symptom checker. The most frequent patient-reported reasons for satisfaction were ease of use, precise and comprehensive questions, perceived time-saving potential, and encouragement of self-reflection. Every other patient would consider at-home use of the symptom checker for the first appraisal of health problems to save time, reduce unnecessary visits, and/or as an aid for the physician. Patients' attitudes towards the symptom checker were not significantly associated with age, sex, or level of education. Most patients (75%) and physicians (84%) indicated that the symptom checker had no effect on the duration of the medical visit. Only a few participants found the use of the symptom checker to be disruptive to the medical visit or its quality. CONCLUSIONS The findings suggest a positive reception of the symptom checker, albeit with differing focus between patients and physicians. With the potential to be integrated further into primary care, these tools require meticulous clinical guidance to maximize their benefits. TRIAL REGISTRATION The study was not registered, as it did not include direct medical intervention on human participants.
Collapse
Affiliation(s)
- Angelika Mahlknecht
- Institute of General Practice and Public Health, College of Health Care Professions (Claudiana), Lorenz Böhler Street 13, 39100, Bolzano, Italy
| | - Adolf Engl
- Institute of General Practice and Public Health, College of Health Care Professions (Claudiana), Lorenz Böhler Street 13, 39100, Bolzano, Italy
| | - Giuliano Piccoliori
- Institute of General Practice and Public Health, College of Health Care Professions (Claudiana), Lorenz Böhler Street 13, 39100, Bolzano, Italy
| | - Christian Josef Wiedermann
- Institute of General Practice and Public Health, College of Health Care Professions (Claudiana), Lorenz Böhler Street 13, 39100, Bolzano, Italy.
- Department of Public Health, Medical Decision Making and HTA, University of Health Sciences, Medical Informatics and Technology, Eduard-Wallnöfer Place 1, 6060, Hall, Austria.
| |
Collapse
|
21
|
Kafke SD, Kuhlmey A, Schuster J, Blüher S, Czimmeck C, Zoellick JC, Grosse P. Can clinical decision support systems be an asset in medical education? An experimental approach. BMC MEDICAL EDUCATION 2023; 23:570. [PMID: 37568144 PMCID: PMC10416486 DOI: 10.1186/s12909-023-04568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 08/04/2023] [Indexed: 08/13/2023]
Abstract
BACKGROUND Diagnostic accuracy is one of the major cornerstones of appropriate and successful medical decision-making. Clinical decision support systems (CDSSs) have recently been used to facilitate physician's diagnostic considerations. However, to date, little is known about the potential assets of CDSS for medical students in an educational setting. The purpose of our study was to explore the usefulness of CDSSs for medical students assessing their diagnostic performances and the influence of such software on students' trust in their own diagnostic abilities. METHODS Based on paper cases students had to diagnose two different patients using a CDSS and conventional methods such as e.g. textbooks, respectively. Both patients had a common disease, in one setting the clinical presentation was a typical one (tonsillitis), in the other setting (pulmonary embolism), however, the patient presented atypically. We used a 2x2x2 between- and within-subjects cluster-randomised controlled trial to assess the diagnostic accuracy in medical students, also by changing the order of the used resources (CDSS first or second). RESULTS Medical students in their 4th and 5th year performed equally well using conventional methods or the CDSS across the two cases (t(164) = 1,30; p = 0.197). Diagnostic accuracy and trust in the correct diagnosis were higher in the typical presentation condition than in the atypical presentation condition (t(85) = 19.97; p < .0001 and t(150) = 7.67; p < .0001).These results refute our main hypothesis that students diagnose more accurately when using conventional methods compared to the CDSS. CONCLUSIONS Medical students in their 4th and 5th year performed equally well in diagnosing two cases of common diseases with typical or atypical clinical presentations using conventional methods or a CDSS. Students were proficient in diagnosing a common disease with a typical presentation but underestimated their own factual knowledge in this scenario. Also, students were aware of their own diagnostic limitations when presented with a challenging case with an atypical presentation for which the use of a CDSS seemingly provided no additional insights.
Collapse
Affiliation(s)
- Sean D Kafke
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
| | - Adelheid Kuhlmey
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Johanna Schuster
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Stefan Blüher
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Constanze Czimmeck
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Jan C Zoellick
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Pascal Grosse
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
22
|
Karia J, Mohamed R, Petrushkin H. Patient-targeted mobile applications in healthcare. Br J Hosp Med (Lond) 2023; 84:1-5. [PMID: 37646550 DOI: 10.12968/hmed.2023.0158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
There has been an increase in the number of healthcare-related applications targeted at patients for use on mobile phones. With an increasing proportion of the population using such applications, it is important to understand the associated limitations, safety concerns and challenges of legalisation. This article explores the impact of these applications on frontline care and patient wellbeing, evaluating the literature surrounding the benefits and challenges of patient-targeted mobile applications in health care and analysing the limitations of existing research. The proclaimed benefits of such applications are not always evidence based. Furthermore, many healthcare applications are created by laypeople and not validated by healthcare authorities, creating a potential to cause patient harm. Further research is needed to identify long-term effects on both healthcare systems and individuals' psychosocial wellbeing. However, research in this field often lacks a universal perspective and may be influenced by underlying financial motives to promote use of the applications.
Collapse
Affiliation(s)
- Janvi Karia
- Division of Medicine, University College London, London, UK
| | - Ryian Mohamed
- Department of Ophthalmology, Moorfields Eye Hospital, London, UK
| | - Harry Petrushkin
- Department of Ophthalmology, Moorfields Eye Hospital, London, UK
- UCL Institute of Ophthalmology, University College London, London, UK
- Department of Ophthalmology, Great Ormond Street Hospital, London, UK
| |
Collapse
|
23
|
Sarbay İ, Berikol GB, Özturan İU. Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study. Turk J Emerg Med 2023; 23:156-161. [PMID: 37529789 PMCID: PMC10389099 DOI: 10.4103/tjem.tjem_79_23] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 04/13/2023] [Accepted: 05/24/2023] [Indexed: 08/03/2023] Open
Abstract
OBJECTIVES Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI's ChatGPT is a supervised and empowered machine learning-based chatbot. The aim of this study was to determine the performance of ChatGPT in emergency medicine (EM) triage prediction. METHODS This was a preliminary, cross-sectional study conducted with case scenarios generated by the researchers based on the emergency severity index (ESI) handbook v4 cases. Two independent EM specialists who were experts in the ESI triage scale determined the triage categories for each case. A third independent EM specialist was consulted as arbiter, if necessary. Consensus results for each case scenario were assumed as the reference triage category. Subsequently, each case scenario was queried with ChatGPT and the answer was recorded as the index triage category. Inconsistent classifications between the ChatGPT and reference category were defined as over-triage (false positive) or under-triage (false negative). RESULTS Fifty case scenarios were assessed in the study. Reliability analysis showed a fair agreement between EM specialists and ChatGPT (Cohen's Kappa: 0.341). Eleven cases (22%) were over triaged and 9 (18%) cases were under triaged by ChatGPT. In 9 cases (18%), ChatGPT reported two consecutive triage categories, one of which matched the expert consensus. It had an overall sensitivity of 57.1% (95% confidence interval [CI]: 34-78.2), specificity of 34.5% (95% CI: 17.9-54.3), positive predictive value (PPV) of 38.7% (95% CI: 21.8-57.8), negative predictive value (NPV) of 52.6 (95% CI: 28.9-75.6), and an F1 score of 0.461. In high acuity cases (ESI-1 and ESI-2), ChatGPT showed a sensitivity of 76.2% (95% CI: 52.8-91.8), specificity of 93.1% (95% CI: 77.2-99.2), PPV of 88.9% (95% CI: 65.3-98.6), NPV of 84.4 (95% CI: 67.2-94.7), and an F1 score of 0.821. The receiver operating characteristic curve showed an area under the curve of 0.846 (95% CI: 0.724-0.969, P < 0.001) for high acuity cases. CONCLUSION The performance of ChatGPT was best when predicting high acuity cases (ESI-1 and ESI-2). It may be useful when determining the cases requiring critical care. When trained with more medical knowledge, ChatGPT may be more accurate for other triage category predictions.
Collapse
Affiliation(s)
- İbrahim Sarbay
- Department of Emergency Medicine, Keşan State Hospital, Edirne, Turkey
| | - Göksu Bozdereli Berikol
- Department of Emergency Medicine, Bakırköy Dr. Sadi Konuk Training and Research Hospital, İstanbul, Turkey
| | - İbrahim Ulaş Özturan
- Department of Emergency Medicine, Kocaeli University, Faculty of Medicine, Kocaeli, Turkey
- Department of Medical Education, Acibadem University, Institute of Health Sciences, Istanbul, Turkey
| |
Collapse
|
24
|
Riboli-Sasco E, El-Osta A, Alaa A, Webber I, Karki M, El Asmar ML, Purohit K, Painter A, Hayhoe B. Triage and Diagnostic Accuracy of Online Symptom Checkers: Systematic Review. J Med Internet Res 2023; 25:e43803. [PMID: 37266983 DOI: 10.2196/43803] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 03/27/2023] [Accepted: 04/11/2023] [Indexed: 06/03/2023] Open
Abstract
BACKGROUND In the context of a deepening global shortage of health workers and, in particular, the COVID-19 pandemic, there is growing international interest in, and use of, online symptom checkers (OSCs). However, the evidence surrounding the triage and diagnostic accuracy of these tools remains inconclusive. OBJECTIVE This systematic review aimed to summarize the existing peer-reviewed literature evaluating the triage accuracy (directing users to appropriate services based on their presenting symptoms) and diagnostic accuracy of OSCs aimed at lay users for general health concerns. METHODS Searches were conducted in MEDLINE, Embase, CINAHL, Health Management Information Consortium (HMIC), and Web of Science, as well as the citations of the studies selected for full-text screening. We included peer-reviewed studies published in English between January 1, 2010, and February 16, 2022, with a controlled and quantitative assessment of either or both triage and diagnostic accuracy of OSCs directed at lay users. We excluded tools supporting health care professionals, as well as disease- or specialty-specific OSCs. Screening and data extraction were carried out independently by 2 reviewers for each study. We performed a descriptive narrative synthesis. RESULTS A total of 21,296 studies were identified, of which 14 (0.07%) were included. The included studies used clinical vignettes, medical records, or direct input by patients. Of the 14 studies, 6 (43%) reported on triage and diagnostic accuracy, 7 (50%) focused on triage accuracy, and 1 (7%) focused on diagnostic accuracy. These outcomes were assessed based on the diagnostic and triage recommendations attached to the vignette in the case of vignette studies or on those provided by nurses or general practitioners, including through face-to-face and telephone consultations. Both diagnostic accuracy and triage accuracy varied greatly among OSCs. Overall diagnostic accuracy was deemed to be low and was almost always lower than that of the comparator. Similarly, most of the studies (9/13, 69 %) showed suboptimal triage accuracy overall, with a few exceptions (4/13, 31%). The main variables affecting the levels of diagnostic and triage accuracy were the severity and urgency of the condition, the use of artificial intelligence algorithms, and demographic questions. However, the impact of each variable differed across tools and studies, making it difficult to draw any solid conclusions. All included studies had at least one area with unclear risk of bias according to the revised Quality Assessment of Diagnostic Accuracy Studies-2 tool. CONCLUSIONS Although OSCs have potential to provide accessible and accurate health advice and triage recommendations to users, more research is needed to validate their triage and diagnostic accuracy before widescale adoption in community and health care settings. Future studies should aim to use a common methodology and agreed standard for evaluation to facilitate objective benchmarking and validation. TRIAL REGISTRATION PROSPERO CRD42020215210; https://tinyurl.com/3949zw83.
Collapse
Affiliation(s)
- Eva Riboli-Sasco
- Self-Care Academic Research Unit (SCARU), Department of Primary Care and Public Health, Imperial College London, London, United Kingdom
| | - Austen El-Osta
- Self-Care Academic Research Unit (SCARU), Department of Primary Care and Public Health, Imperial College London, London, United Kingdom
| | - Aos Alaa
- Self-Care Academic Research Unit (SCARU), Department of Primary Care and Public Health, Imperial College London, London, United Kingdom
| | - Iman Webber
- Self-Care Academic Research Unit (SCARU), Department of Primary Care and Public Health, Imperial College London, London, United Kingdom
| | - Manisha Karki
- Self-Care Academic Research Unit (SCARU), Department of Primary Care and Public Health, Imperial College London, London, United Kingdom
| | - Marie Line El Asmar
- Self-Care Academic Research Unit (SCARU), Department of Primary Care and Public Health, Imperial College London, London, United Kingdom
| | - Katie Purohit
- Self-Care Academic Research Unit (SCARU), Department of Primary Care and Public Health, Imperial College London, London, United Kingdom
| | - Annabelle Painter
- Self-Care Academic Research Unit (SCARU), Department of Primary Care and Public Health, Imperial College London, London, United Kingdom
| | - Benedict Hayhoe
- Self-Care Academic Research Unit (SCARU), Department of Primary Care and Public Health, Imperial College London, London, United Kingdom
| |
Collapse
|
25
|
Turnbull J, MacLellan J, Churruca K, Ellis LA, Prichard J, Browne D, Braithwaite J, Petter E, Chisambi M, Pope C. A multimethod study of NHS 111 online. HEALTH AND SOCIAL CARE DELIVERY RESEARCH 2023; 11:1-104. [PMID: 37464813 DOI: 10.3310/ytrr9821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]
Abstract
Background NHS 111 online offers 24-hour access to health assessment and triage. Objectives This study examined pathways to care, differential access and use, and workforce impacts of NHS 111 online. This study compared NHS 111 with Healthdirect (Haymarket, Australia) virtual triage. Design Interviews with 80 staff and stakeholders in English primary, urgent and emergency care, and 41 staff and stakeholders associated with Healthdirect. A survey of 2754 respondents, of whom 1137 (41.3%) had used NHS 111 online and 1617 (58.7%) had not. Results NHS 111 online is one of several digital health-care technologies and was not differentiated from the NHS 111 telephone service or well understood. There is a similar lack of awareness of Healthdirect virtual triage. NHS 111 and Healthdirect virtual triage are perceived as creating additional work for health-care staff and inappropriate demand for some health services, especially emergency care. One-third of survey respondents reported that they had not used any NHS 111 service (telephone or online). Older people and those with less educational qualifications are less likely to use NHS 111 online. Respondents who had used NHS 111 online reported more use of other urgent care services and make more cumulative use of services than those who had not used NHS 111 online. Users of NHS 111 online had higher levels of self-reported eHealth literacy. There were differences in reported preferences for using NHS 111 online for different symptom presentations. Conclusions Greater clarity about what the NHS 111 online service offers would allow better signposting and reduce confusion. Generic NHS 111 services are perceived as creating additional work in the primary, urgent and emergency care system. There are differences in eHealth literacy between users and those who have not used NHS 111 online, and this suggests that 'digital first' policies may increase health inequalities. Limitations This research bridged the pandemic from 2020 to 2021; therefore, findings may change as services adjust going forward. Surveys used a digital platform so there is probably bias towards some level of e-Literacy, but this also means that our data may underestimate the digital divide. Future work Further investigation of access to digital services could address concerns about digital exclusion. Research comparing the affordances and cost-benefits of different triage and assessment systems for users and health-care providers is needed. Research about trust in virtual assessments may show how duplication can be reduced. Mixed-methods studies looking at outcomes, impacts on work and costs, and ways to measure eHealth literacy, can inform the development NHS 111 online and opportunities for further international shared learning could be pursued. Study registration This study is registered at the research registry (UIN 5392). Funding This project was funded by the National Institute for Health and Care Research (NIHR) Health and Social Care Delivery Research Programme and will be published in full in Health and Social Care Delivery Research; Vol. 11, No. 5. See the NIHR Journals Library website for further project information.
Collapse
Affiliation(s)
- Joanne Turnbull
- School of Health Sciences, University of Southampton, Southampton, UK
| | - Jennifer MacLellan
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - Kate Churruca
- Australian Institute of Health Innovation, Macquarie University, Sydney, NSW, Australia
| | - Louise A Ellis
- Australian Institute of Health Innovation, Macquarie University, Sydney, NSW, Australia
| | - Jane Prichard
- School of Health Sciences, University of Southampton, Southampton, UK
| | | | - Jeffrey Braithwaite
- Australian Institute of Health Innovation, Macquarie University, Sydney, NSW, Australia
| | - Emily Petter
- NHS Hampshire, Southampton and Isle of Wight Clinical Commissioning Group, Winchester, UK
| | - Matthew Chisambi
- Imperial College Health Partners, Chelsea and Westminster Hospital NHS Foundation Trust, London, UK
| | - Catherine Pope
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| |
Collapse
|
26
|
Odisho AY, Liu AW, Maiorano AR, Bigazzi MOA, Medina E, Leard LE, Shah R, Venado A, Perez A, Golden J, Kleinhenz ME, Kolaitis NA, Maheshwari J, Trinh BN, Kukreja J, Greenland J, Calabrese D, Neinstein AB, Singer JP, Hays SR. Design and implementation of a digital health home spirometry intervention for remote monitoring of lung transplant function. J Heart Lung Transplant 2023; 42:828-837. [PMID: 37031033 DOI: 10.1016/j.healun.2023.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 01/04/2023] [Accepted: 01/23/2023] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND We developed an automated, chat-based, digital health intervention using Bluetooth-enabled home spirometers to monitor for complications of lung transplantation in a real-world application. METHODS A chat-based application prompted patients to perform home spirometry, enter their forced expiratory volume in 1 second (FEV1), answer symptom queries, and provided patient education. The program alerted patients and providers to substantial FEV1 decreases and concerning symptoms. Data was integrated into the electronic health record (EHR) system and dashboards were developed for program monitoring. RESULT Between May 2020 and December 2021, 544 patients were invited to enroll, of whom 427 were invited remotely and 117 were enrolled in-person. 371 (68%) participated by submitting ≥1 FEV1 values. Overall engagement was high, with an average of 197 unique patients submitting FEV1 data per month. In-person enrollees submitted an average of 4.6 FEV1 values per month and responded to 55% of scheduled chats. Home and laboratory FEV1 values correlated closely (rho = 0.93). There was an average of 133 ± 59 FEV1 decline alerts and 59 ± 23 symptom alerts per month. 72% of patients accessed education modules, and the program had a high net promoter score (53) amongst users. CONCLUSIONS We demonstrate that a novel, automated, chat-based, and EHR-integrated home spirometry intervention is well accepted, generates reliable assessments of graft function, and can deliver automated feedback and education resulting in moderately-high adherence rates. We found that in-person onboarding yields better engagement and adherence. Future work will aim to demonstrate the impact of remote care monitoring on early detection of lung transplant complications.
Collapse
Affiliation(s)
- Anobel Y Odisho
- Center for Digital Health Innovation, University of California, San Francisco, California; Department of Urology, University of California, San Francisco, California
| | - Andrew W Liu
- Center for Digital Health Innovation, University of California, San Francisco, California
| | - Ali R Maiorano
- Center for Digital Health Innovation, University of California, San Francisco, California
| | - M Olivia A Bigazzi
- Center for Digital Health Innovation, University of California, San Francisco, California
| | - Eli Medina
- Center for Digital Health Innovation, University of California, San Francisco, California
| | - Lorriana E Leard
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Rupal Shah
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Aida Venado
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Alyssa Perez
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Jeffrey Golden
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Mary Ellen Kleinhenz
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Nicholas A Kolaitis
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Julia Maheshwari
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Binh N Trinh
- Department of Surgery, University of California, San Francisco, California
| | - Jasleen Kukreja
- Department of Surgery, University of California, San Francisco, California
| | - John Greenland
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Daniel Calabrese
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Aaron B Neinstein
- Center for Digital Health Innovation, University of California, San Francisco, California; Endocrinology Division, Department of Medicine, University of California, San Francisco, California
| | - Jonathan P Singer
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
| | - Steven R Hays
- Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California.
| |
Collapse
|
27
|
Marcin T, Hautz SC, Singh H, Zwaan L, Schwappach D, Krummrey G, Schauber SK, Nendaz M, Exadaktylos AK, Müller M, Lambrigger C, Sauter TC, Lindner G, Bosbach S, Griesshammer I, Hautz WE. Effects of a computerised diagnostic decision support tool on diagnostic quality in emergency departments: study protocol of the DDx-BRO multicentre cluster randomised cross-over trial. BMJ Open 2023; 13:e072649. [PMID: 36990482 PMCID: PMC10069571 DOI: 10.1136/bmjopen-2023-072649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/31/2023] Open
Abstract
INTRODUCTION Computerised diagnostic decision support systems (CDDS) suggesting differential diagnoses to physicians aim to improve clinical reasoning and diagnostic quality. However, controlled clinical trials investigating their effectiveness and safety are absent and the consequences of its use in clinical practice are unknown. We aim to investigate the effect of CDDS use in the emergency department (ED) on diagnostic quality, workflow, resource consumption and patient outcomes. METHODS AND ANALYSIS This is a multicentre, outcome assessor and patient-blinded, cluster-randomised, multiperiod crossover superiority trial. A validated differential diagnosis generator will be implemented in four EDs and randomly allocated to a sequence of six alternating intervention and control periods. During intervention periods, the treating ED physician will be asked to consult the CDDS at least once during diagnostic workup. During control periods, physicians will not have access to the CDDS and diagnostic workup will follow usual clinical care. Key inclusion criteria will be patients' presentation to the ED with either fever, abdominal pain, syncope or a non-specific complaint as chief complaint. The primary outcome is a binary diagnostic quality risk score composed of presence of an unscheduled medical care after discharge, change in diagnosis or death during time of follow-up or an unexpected upscale in care within 24 hours after hospital admission. Time of follow-up is 14 days. At least 1184 patients will be included. Secondary outcomes include length of hospital stay, diagnostics and data regarding CDDS usage, physicians' confidence calibration and diagnostic workflow. Statistical analysis will use general linear mixed modelling methods. ETHICS AND DISSEMINATION Approved by the cantonal ethics committee of canton Berne (2022-D0002) and Swissmedic, the Swiss national regulatory authority on medical devices. Study results will be disseminated through peer-reviewed journals, open repositories and the network of investigators and the expert and patients advisory board. TRIAL REGISTRATION NUMBER NCT05346523.
Collapse
Affiliation(s)
- Thimo Marcin
- Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Stefanie C Hautz
- Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Hardeep Singh
- Center for Innovations in Quality, Effectiveness and Safety (IQuESt), Michael E DeBakey VA Medical Center, Houston, Texas, USA
- Department of Medicine, Baylor College of Medicine, Houston, Texas, USA
| | - Laura Zwaan
- Institute of Medical Education Research Rotterdam (iMERR), Erasmus Medical Center, Rotterdam, The Netherlands
| | - David Schwappach
- Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
| | - Gert Krummrey
- Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Bern University of Applied Sciences, Biel, Switzerland
| | - Stefan K Schauber
- Center for Educational Measurement and Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Mathieu Nendaz
- Department of Medicine, University of Geneva, Geneve, Switzerland
| | | | - Martin Müller
- Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Cornelia Lambrigger
- Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Thomas C Sauter
- Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Gregor Lindner
- Department of Internal and Emergency Medicine, Burgerspital Solothurn, Solothurn, Switzerland
| | | | | | - Wolf E Hautz
- Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| |
Collapse
|
28
|
Lloyd ML, Billingslea S, Slama R. Atraumatic Vertebral Artery Dissection in a Patient With a Migraine Headache. Mil Med 2023; 188:e848-e851. [PMID: 33876248 DOI: 10.1093/milmed/usab135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 03/10/2021] [Accepted: 03/31/2021] [Indexed: 11/12/2022] Open
Abstract
This case discusses a 34-year-old active duty male who presented to the emergency department with a 2-week persistent headache. His initial review of symptoms was reassuring until a detailed neurologic examination on his second visit revealed a visual deficit in the left upper quadrant. Additionally, he complained of intermittent tension headaches for the last several years but had no history of diagnosed migraines until he was seen 4 days prior for empiric migraine therapy in the same emergency department and left without improvement in symptoms. On his return visit, computerized tomography scan with intravenous contrast revealed a left vertebral artery dissection and hematoma. The patient was admitted for medical management and subsequently found to have suffered a small infarction of right lingual gyrus cortex on magnetic resonance imaging. This case illustrates the importance of maintaining a broad differential diagnosis and high index of suspicion in the patient with new focal neurologic findings in order to diagnose a potentially fatal disease.
Collapse
Affiliation(s)
- Michael L Lloyd
- Department of Emergency Medicine, Naval Medical Center Portsmouth, Portsmouth, VA 23708, USA
| | | | - Richard Slama
- Department of Emergency Medicine, Naval Medical Center Portsmouth, Portsmouth, VA 23708, USA
| |
Collapse
|
29
|
Exploratory study: Evaluation of a symptom checker effectiveness for providing a diagnosis and evaluating the situation emergency compared to emergency physicians using simulated and standardized patients. PLoS One 2023; 18:e0277568. [PMID: 36827277 PMCID: PMC9955603 DOI: 10.1371/journal.pone.0277568] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 10/30/2022] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND The overloading of health care systems is an international problem. In this context, new tools such as symptom checker (SC) are emerging to improve patient orientation and triage. This SC should be rigorously evaluated and we can take a cue from the way we evaluate medical students, using objective structured clinical examinations (OSCE) with simulated patients. OBJECTIVE The main objective of this study was to evaluate the efficiency of a symptom checker versus emergency physicians using OSCEs as an assessment method. METHODS We explored a method to evaluate the ability to set a diagnosis and evaluate the emergency of a situation with simulation. A panel of medical experts wrote 220 simulated patients cases. Each situation was played twice by an actor trained to the role: once for the SC, then for an emergency physician. Like a teleconsultation, only the patient's voice was accessible. We performed a prospective non-inferiority study. If primary analysis had failed to detect non-inferiority, we have planned a superiority analysis. RESULTS The SC established only 30% of the main diagnosis as the emergency physician found 81% of these. The emergency physician was also superior compared to the SC in the suggestion of secondary diagnosis (92% versus 52%). In the matter of patient triage (vital emergency or not), there is still a medical superiority (96% versus 71%). We prove a non-inferiority of the SC compared to the physician in terms of interviewing time. CONCLUSIONS AND RELEVANCE We should use simulated patients instead of clinical cases in order to evaluate the effectiveness of SCs.
Collapse
|
30
|
Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3378. [PMID: 36834073 PMCID: PMC9967747 DOI: 10.3390/ijerph20043378] [Citation(s) in RCA: 100] [Impact Index Per Article: 100.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 02/09/2023] [Accepted: 02/13/2023] [Indexed: 06/01/2023]
Abstract
The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.
Collapse
Affiliation(s)
- Takanobu Hirosawa
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan
| | | | | | | | | | | |
Collapse
|
31
|
Online symptom checkers lack diagnostic accuracy for skin rashes. J Am Acad Dermatol 2023; 88:487-488. [PMID: 36243544 DOI: 10.1016/j.jaad.2022.06.034] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 05/17/2022] [Accepted: 06/07/2022] [Indexed: 11/06/2022]
|
32
|
Pairon A, Philips H, Verhoeven V. A scoping review on the use and usefulness of online symptom checkers and triage systems: How to proceed? Front Med (Lausanne) 2023; 9:1040926. [PMID: 36687416 PMCID: PMC9853165 DOI: 10.3389/fmed.2022.1040926] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/16/2022] [Indexed: 01/09/2023] Open
Abstract
Background Patients are increasingly turning to the Internet for health information. Numerous online symptom checkers and digital triage tools are currently available to the general public in an effort to meet this need, simultaneously acting as a demand management strategy to aid the overburdened health care system. The implementation of these services requires an evidence-based approach, warranting a review of the available literature on this rapidly evolving topic. Objective This scoping review aims to provide an overview of the current state of the art and identify research gaps through an analysis of the strengths and weaknesses of the presently available literature. Methods A systematic search strategy was formed and applied to six databases: Cochrane library, NICE, DARE, NIHR, Pubmed, and Web of Science. Data extraction was performed by two researchers according to a pre-established data charting methodology allowing for a thematic analysis of the results. Results A total of 10,250 articles were identified, and 28 publications were found eligible for inclusion. Users of these tools are often younger, female, more highly educated and technologically literate, potentially impacting digital divide and health equity. Triage algorithms remain risk-averse, which causes challenges for their accuracy. Recent evolutions in algorithms have varying degrees of success. Results on impact are highly variable, with potential effects on demand, accessibility of care, health literacy and syndromic surveillance. Both patients and healthcare providers are generally positive about the technology and seem amenable to the advice given, but there are still improvements to be made toward a more patient-centered approach. The significant heterogeneity across studies and triage systems remains the primary challenge for the field, limiting transferability of findings. Conclusion Current evidence included in this review is characterized by significant variability in study design and outcomes, highlighting the significant challenges for future research.An evolution toward more homogeneous methodologies, studies tailored to the intended setting, regulation and standardization of evaluations, and a patient-centered approach could benefit the field.
Collapse
|
33
|
Kopka M, Feufel MA, Berner ES, Schmieding ML. How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective. Digit Health 2023; 9:20552076231194929. [PMID: 37614591 PMCID: PMC10444026 DOI: 10.1177/20552076231194929] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 07/28/2023] [Indexed: 08/25/2023] Open
Abstract
Objective To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies. Methods We re-analyzed the publicly available data of two prominent case vignette-based symptom checker audit studies by calculating common metrics of test theory. Furthermore, we developed a new metric, the Capability Comparison Score (CCS), which compares symptom checker capability while controlling for the difficulty of the set of cases each symptom checker evaluated. We then scrutinized whether applying test theory and the CCS altered the performance ranking of the investigated symptom checkers. Results In both studies, most symptom checkers changed their rank order when adjusting the triage capability for item difficulty (ID) with the CCS. The previously reported triage accuracies commonly overestimated the capability of symptom checkers because they did not account for the fact that symptom checkers tend to selectively appraise easier cases (i.e., with high ID values). Also, many case vignettes in both studies showed insufficient (very low and even negative) values of item-total correlation (ITC), suggesting that individual items or the composition of item sets are of low quality. Conclusions A test-theoretic perspective helps identify previously undetected threats to the validity of case vignette-based symptom checker assessments and provides guidance and specific metrics to improve the quality of case vignettes, in particular by controlling for the difficulty of the vignettes an app was (not) able to evaluate correctly. Such measures might prove more meaningful than accuracy alone for the competitive assessment of symptom checkers. Our approach helps elaborate and standardize the methodology used for appraising symptom checker capability, which, ultimately, may yield more reliable results.
Collapse
Affiliation(s)
- Marvin Kopka
- Department of Psychology and Ergonomics (IPA), Division of Ergonomics, Technische Universität Berlin, Berlin, Germany
- Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Markus A Feufel
- Department of Psychology and Ergonomics (IPA), Division of Ergonomics, Technische Universität Berlin, Berlin, Germany
| | - Eta S Berner
- Department of Health Services Administration, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Malte L Schmieding
- Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
34
|
Churruca K, Ellis LA, Pope C, MacLellan J, Zurynski Y, Braithwaite J. The place of digital triage in a complex healthcare system: An interview study with key stakeholders in Australia's national provider. Digit Health 2023; 9:20552076231181201. [PMID: 37377561 PMCID: PMC10291532 DOI: 10.1177/20552076231181201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 05/24/2023] [Indexed: 06/29/2023] Open
Abstract
Background Digital triage tools such as telephone advice and online symptom checkers are now commonplace in health systems internationally. Research has focused on consumers' adherence to advice, health outcomes, satisfaction, and the degree to which these services manage demand for general practice or emergency departments. Such studies have had mixed findings, leaving equivocal the role of these services in healthcare. Objective We examined stakeholders' perspectives on Healthdirect, Australia's national digital triage provider, focusing on its role in the health system, and barriers to operation, in the context of the COVID-19 pandemic. Methods Key stakeholders took part in semi-structured interviews conducted online in the third quarter of 2021. Transcripts were coded and thematically analysed. Results Participants (n = 41) were Healthdirect staff (n = 13), employees of Primary Health Networks (PHNs; n = 12), clinicians (n = 9), shareholder representatives (n = 4), consumer representatives (n = 2) and other policymakers (n = 1). Eight themes emerged from the analysis: (1) information and guidance in navigating the system, (2) efficiency through appropriate care, (3) value for consumers? (4) the difficulties in triage at a distance, (5) competition and the unfulfilled promise of integration, (6) challenges in promoting Healthdirect, (7) monitoring and evaluating digital triage services and (8) rapid change, challenge and opportunity from COVID-19. Conclusion Stakeholders varied in their views of the purpose of Healthdirect's digital triage services. They identified challenges in lack of integration, competition, and the limited public profile of the services, issues largely reflective of the complexity of the policy and health system landscape. There was acknowledgement of the value of the services during the COVID-19 pandemic, and an expectation of them realising greater potential in the wake of the rapid uptake of telehealth.
Collapse
Affiliation(s)
- Kate Churruca
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Louise A Ellis
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Catherine Pope
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - Jennifer MacLellan
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - Yvonne Zurynski
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Jeffrey Braithwaite
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
35
|
North F, Jensen TB, Stroebel RJ, Nelson EM, Johnson BJ, Thompson MC, Pecina JL, Crum BA. Self-Triage Use, Subsequent Healthcare Utilization, and Diagnoses: A Retrospective Study of Process and Clinical Outcomes Following Self-Triage and Self-Scheduling for Ear or Hearing Symptoms. Health Serv Res Manag Epidemiol 2023; 10:23333928231168121. [PMID: 37101803 PMCID: PMC10123887 DOI: 10.1177/23333928231168121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2023] Open
Abstract
Background Self-triage is becoming more widespread, but little is known about the people who are using online self-triage tools and their outcomes. For self-triage researchers, there are significant barriers to capturing subsequent healthcare outcomes. Our integrated healthcare system was able to capture subsequent healthcare utilization of individuals who used self-triage integrated with self-scheduling of provider visits. Methods We retrospectively examined healthcare utilization and diagnoses after patients had used self-triage and self-scheduling for ear or hearing symptoms. Outcomes and counts of office visits, telemedicine interactions, emergency department visits, and hospitalizations were captured. Diagnosis codes associated with subsequent provider visits were dichotomously categorized as being associated with ear or hearing concerns or not. Nonvisit care encounters of patient-initiated messages, nurse triage calls, and clinical communications were also captured. Results For 2168 self-triage uses, we were able to capture subsequent healthcare encounters within 7 days of the self-triage for 80.5% (1745/2168). In subsequent 1092 office visits with diagnoses, 83.1% (891/1092) of the uses were associated with relevant ear, nose and throat diagnoses. Only 0.24% (4/1662) of patients with captured outcomes were associated with a hospitalization within 7 days. Self-triage resulted in a self-scheduled office visit in 7.2% (126/1745). Office visits resulting from a self-scheduled visit had significantly fewer combined non-visit care encounters per office visit (fewer combined nurse triage calls, patient messages, and clinical communication messages) than office visits that were not self-scheduled (-0.51; 95% CI, -0.72 to -0.29; P < .0001). Conclusion In an appropriate healthcare setting, self-triage outcomes can be captured in a high percentage of uses to examine for safety, patient adherence to recommendations, and efficiency of self-triage. With the ear or hearing self-triage, most uses had subsequent visit diagnoses relevant to ear or hearing, so most patients appeared to be selecting the appropriate self-triage pathway for their symptoms.
Collapse
Affiliation(s)
- Frederick North
- Department of Medicine, Division of Community Internal Medicine, Geriatrics, and Palliative Care, Mayo Clinic, Rochester, MN, USA
- Frederick North, Department of Medicine, Division of Community Internal Medicine, Geriatrics, and Palliative Care, Mayo Clinic, Rochester, MN 55905, USA.
| | - Teresa B Jensen
- Department of Family Medicine, Mayo Clinic, Rochester, MN, USA
| | - Robert J Stroebel
- Department of Medicine, Division of Community Internal Medicine, Geriatrics, and Palliative Care, Mayo Clinic, Rochester, MN, USA
| | - Elissa M Nelson
- Enterprise Office of Access Management, Mayo Clinic, Rochester, MN, USA
| | - Brenda J Johnson
- Enterprise Office of Access Management, Mayo Clinic, Rochester, MN, USA
| | | | | | - Brian A Crum
- Department of Neurology, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
36
|
A clinical decision support system in back pain helps to find the diagnosis: a prospective correlation study. Arch Orthop Trauma Surg 2023; 143:621-625. [PMID: 34347121 PMCID: PMC9925533 DOI: 10.1007/s00402-021-04080-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 07/17/2021] [Indexed: 10/20/2022]
Abstract
The aim of this study is to show the concordance of an app-based decision support system and the diagnosis given by spinal surgeons in cases of back pain. 86 patients took part within 2 months. They were seen by spine surgeons in the daily routine and then completed an app-based questionnaire that also led to a diagnosis independently. The results showed a Cramer's V = .711 (p < .001), which can be taken as a strong relation between the tool and the diagnosis of the medical doctor. Besides, in 67.4% of the cases, the diagnosis was concordant. An overestimation of the severity of the diagnosis occurred more often than underestimation (15.1% vs. 7%). The app-based tool is a safe tool to support healthcare professionals in back pain diagnosis.
Collapse
|
37
|
Ilicki J. Challenges in evaluating the accuracy of AI-containing digital triage systems: A systematic review. PLoS One 2022; 17:e0279636. [PMID: 36574438 PMCID: PMC9794085 DOI: 10.1371/journal.pone.0279636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 12/12/2022] [Indexed: 12/28/2022] Open
Abstract
INTRODUCTION Patient-operated digital triage systems with AI components are becoming increasingly common. However, previous reviews have found a limited amount of research on such systems' accuracy. This systematic review of the literature aimed to identify the main challenges in determining the accuracy of patient-operated digital AI-based triage systems. METHODS A systematic review was designed and conducted in accordance with PRISMA guidelines in October 2021 using PubMed, Scopus and Web of Science. Articles were included if they assessed the accuracy of a patient-operated digital triage system that had an AI-component and could triage a general primary care population. Limitations and other pertinent data were extracted, synthesized and analysed. Risk of bias was not analysed as this review studied the included articles' limitations (rather than results). Results were synthesized qualitatively using a thematic analysis. RESULTS The search generated 76 articles and following exclusion 8 articles (6 primary articles and 2 reviews) were included in the analysis. Articles' limitations were synthesized into three groups: epistemological, ontological and methodological limitations. Limitations varied with regards to intractability and the level to which they can be addressed through methodological choices. Certain methodological limitations related to testing triage systems using vignettes can be addressed through methodological adjustments, whereas epistemological and ontological limitations require that readers of such studies appraise the studies with limitations in mind. DISCUSSION The reviewed literature highlights recurring limitations and challenges in studying the accuracy of patient-operated digital triage systems with AI components. Some of these challenges can be addressed through methodology whereas others are intrinsic to the area of inquiry and involve unavoidable trade-offs. Future studies should take these limitations in consideration in order to better address the current knowledge gaps in the literature.
Collapse
|
38
|
Ponce-Blandón JA, Romero-Castillo R, Rodríguez-Leal L, González-Hervías R, Velarde-García JF, Álvarez-Embarba B. A Multicenter Study about the Population Treated in the Respiratory Triage Stations Deployed by the Red Cross during the COVID-19 Pandemic. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 20:313. [PMID: 36612635 PMCID: PMC9819537 DOI: 10.3390/ijerph20010313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/18/2022] [Accepted: 12/22/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND Care demand exceeded the availability of human and material resources during the COVID-19 pandemic, which is the reason why triage was fundamental. The objective is to know the clinical and sociodemographic factors of confirmed or suspected COVID-19 cases in triage stations from different Ecuadorian provinces. METHOD A multicenter study with a retrospective and descriptive design. The patients included were those who accessed the Respiratory Triage stations deployed by the Ecuadorian Red Cross in eight Ecuadorian provinces during March and April 2021. Triage allows for selecting patients that need urgent treatment and favors efficacy of health resources. RESULTS The study population consisted of a total of 21,120 patients, of which 43.1% were men and 56.9% were women, with an age range between 0 and 98 years old. Severity of COVID-19 behaved differently according to gender, with mild symptoms predominating in women and severe or critical symptoms in men. Higher incidence of critical cases was observed in patients over 65 years old. It was observed that overweight predominated in critical, severe, and moderate cases, while the body mass index of patients with mild symptoms was within the normal range. CONCLUSIONS The Ecuadorian Red Cross units identified some suspected COVID-19 cases, facilitating their follow-up and isolation. Fever was the most significant early finding.
Collapse
Affiliation(s)
- José Antonio Ponce-Blandón
- Red Cross Nursing University Centre, University of Seville, 41009 Seville, Spain
- International Federation of the Red Cross, Ecuador Headquarters, Quito 170403, Ecuador
| | | | - Leyre Rodríguez-Leal
- Red Cross Nursing University College, Autonomous University of Madrid, 28003 Madrid, Spain
| | | | - Juan Francisco Velarde-García
- Red Cross Nursing University College, Autonomous University of Madrid, 28003 Madrid, Spain
- Research Group of Humanities and Qualitative Research in Health Science (Hum&QRinHS), Universidad Rey Juan Carlos, Avenida Atenas s/n, 28922 Alcorcon, Spain
- Nursing Research Support Unit, Hospital General Universitario Gregorio Maranon, Calle Dr. Esquerdo 46, 28007 Madrid, Spain
| | | |
Collapse
|
39
|
Vargas Meza X, Koyama S. A social media network analysis of trypophobia communication. Sci Rep 2022; 12:21163. [PMID: 36477698 PMCID: PMC9729576 DOI: 10.1038/s41598-022-25301-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 11/28/2022] [Indexed: 12/13/2022] Open
Abstract
Trypophobia has attracted scientific attention in recent years. Few related studies have recruited participants using online methods, and even less is known about health communication in an environment where trypophobia was first widely discussed (i.e., the Internet). This study describes communication patterns in a Facebook group for trypophobia by detecting frequent topics, top contributors, and their discourses. We identified key commenters and performed word frequency analysis, word co-occurrence analysis, topic modeling, and content analysis. Impactful users posted and replied more often when discussing peer-reviewed science. Triggering content was actively removed by the group administrators. A wide variety of triggers not discussed in trypophobia-related literature were frequently mentioned. However, there was a lack of discussion on peer-reviewed treatments. The combination of a few expert and many supportive amateur gatekeepers willing to understand trypophobia, along with active monitoring by administrators, might contribute to in-group trust and the sharing of peer-reviewed science by top users of the trypophobia Facebook group.
Collapse
Affiliation(s)
- Xanat Vargas Meza
- grid.20515.330000 0001 2369 4728Faculty of Library, Information and Media Sciences, University of Tsukuba, Tsukuba, Ibaraki Japan ,grid.262576.20000 0000 8863 9909Present Address: Global Innovation Research Organization, Ritsumeikan University, Ibaraki, Osaka Japan
| | - Shinichi Koyama
- grid.20515.330000 0001 2369 4728Faculty of Art and Design, University of Tsukuba, Tsukuba, Ibaraki Japan
| |
Collapse
|
40
|
Müller R, Klemmt M, Ehni HJ, Henking T, Kuhnmünch A, Preiser C, Koch R, Ranisch R. Ethical, legal, and social aspects of symptom checker applications: a scoping review. MEDICINE, HEALTH CARE, AND PHILOSOPHY 2022; 25:737-755. [PMID: 36181620 PMCID: PMC9613552 DOI: 10.1007/s11019-022-10114-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 09/03/2022] [Indexed: 06/16/2023]
Abstract
Symptom Checker Applications (SCA) are mobile applications often designed for the end-user to assist with symptom assessment and self-triage. SCA are meant to provide the user with easily accessible information about their own health conditions. However, SCA raise questions regarding ethical, legal, and social aspects (ELSA), for example, regarding fair access to this new technology. The aim of this scoping review is to identify the ELSA of SCA in the scientific literature. A scoping review was conducted to identify the ELSA of SCA. Ten databases (e.g., Web of Science and PubMed) were used. Studies on SCA that address ELSA, written in English or German, were included in the review. The ELSA of SCA were extracted and synthesized using qualitative content analysis. A total of 25,061 references were identified, of which 39 were included in the analysis. The identified aspects were allotted to three main categories: (1) Technology; (2) Individual Level; and (3) Healthcare system. The results show that there are controversial debates in the literature on the ethical and social challenges of SCA usage. Furthermore, the debates are characterised by a lack of a specific legal perspective and empirical data. The review provides an overview on the spectrum of ELSA regarding SCA. It offers guidance to stakeholders in the healthcare system, for example, patients, healthcare professionals, and insurance providers and could be used in future empirical research to investigate the perspectives of those affected, such as users.
Collapse
Affiliation(s)
- Regina Müller
- Institute of Ethics and History of Medicine, University of Tübingen, Gartenstraße 47, 72074 Tübingen, Germany
| | - Malte Klemmt
- Institute of Applied Social Sciences, University of Applied Sciences Würzburg-Schweinfurt, Münzstraße 12, 97070 Würzburg, Germany
| | - Hans-Jörg Ehni
- Institute of Ethics and History of Medicine, University of Tübingen, Gartenstraße 47, 72074 Tübingen, Germany
| | - Tanja Henking
- Institute of Applied Social Sciences, University of Applied Sciences Würzburg-Schweinfurt, Münzstraße 12, 97070 Würzburg, Germany
| | - Angelina Kuhnmünch
- Institute of Ethics and History of Medicine, University of Tübingen, Gartenstraße 47, 72074 Tübingen, Germany
| | - Christine Preiser
- Institute of Occupational and Social Medicine and Health Services Research, University Hospital Tübingen, Wilhelmstraße 27, 72074 Tübingen, Germany
| | - Roland Koch
- Institute for General Practice and Interprofessional Care, University Medicine Tübingen, Osianderstraße 5, 72076 Tübingen, Germany
| | - Robert Ranisch
- Faculty of Health Sciences Brandenburg, University of Potsdam, Karl-Liebknecht-Str. 24-25, House 16, 14476 Potsdam, Golm, Germany
| |
Collapse
|
41
|
Judson TJ, Pierce L, Tutman A, Mourad M, Neinstein AB, Shuler G, Gonzales R, Odisho AY. Utilization patterns and efficiency gains from use of a fully EHR-integrated COVID-19 self-triage and self-scheduling tool: a retrospective analysis. J Am Med Inform Assoc 2022; 29:2066-2074. [PMID: 36029243 PMCID: PMC9667153 DOI: 10.1093/jamia/ocac161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 08/18/2022] [Accepted: 08/26/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Symptom checkers can help address high demand for SARS-CoV2 (COVID-19) testing and care by providing patients with self-service access to triage recommendations. However, health systems may be hesitant to invest in these tools, as their associated efficiency gains have not been studied. We aimed to quantify the operational efficiency gains associated with use of an online COVID-19 symptom checker as an alternative to a telephone hotline. METHODS In our health system, ambulatory patients can either use an online symptom checker or a telephone hotline to be triaged and connected to COVID-19 care. We performed a retrospective analysis of adults who used either method between October 20, 2021 and January 10, 2022, using call logs, electronic health record data, and local wages to calculate labor costs. RESULTS Of the 15 549 total COVID-19 triage encounters, 1820 (11.7%) used only the telephone hotline and 13 729 (88.3%) used the symptom checker. Only 271 (2%) of the patients who used the symptom checker also called the hotline. Hotline encounters required more clinician time compared to those involving the symptom checker (17.8 vs 0.4 min/encounter), resulting in higher average labor costs ($24.21 vs $0.55 per encounter). The symptom checker resulted in over 4200 clinician labor hours saved. CONCLUSION When given the option, most patients completed COVID-19 triage and visit scheduling online, resulting in substantial efficiency gains. These benefits may encourage health system investment in such tools.
Collapse
Affiliation(s)
- Timothy J Judson
- Department of Medicine, University of California San Francisco, San Francisco, California, USA
- Center for Digital Health Innovation, University of California San Francisco, San Francisco, California, USA
- Office of Population Health, University of California San Francisco, San Francisco, California, USA
| | - Logan Pierce
- Department of Medicine, University of California San Francisco, San Francisco, California, USA
- Center for Digital Health Innovation, University of California San Francisco, San Francisco, California, USA
| | - Avi Tutman
- Office of Population Health, University of California San Francisco, San Francisco, California, USA
| | - Michelle Mourad
- Department of Medicine, University of California San Francisco, San Francisco, California, USA
- Center for Digital Health Innovation, University of California San Francisco, San Francisco, California, USA
| | - Aaron B Neinstein
- Department of Medicine, University of California San Francisco, San Francisco, California, USA
- Center for Digital Health Innovation, University of California San Francisco, San Francisco, California, USA
| | - Gina Shuler
- Office of Population Health, University of California San Francisco, San Francisco, California, USA
| | - Ralph Gonzales
- Department of Medicine, University of California San Francisco, San Francisco, California, USA
- Clinical Innovation Center, University of California San Francisco, San Francisco, California, USA
| | - Anobel Y Odisho
- Center for Digital Health Innovation, University of California San Francisco, San Francisco, California, USA
- Department of Urology, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
42
|
Painter A, Hayhoe B, Riboli-Sasco E, El-Osta A. Online Symptom Checkers: Recommendations for a Vignette-Based Clinical Evaluation Standard. J Med Internet Res 2022; 24:e37408. [DOI: 10.2196/37408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/13/2022] Open
Abstract
The use of patient-facing online symptom checkers (OSCs) has expanded in recent years, but their accuracy, safety, and impact on patient behaviors and health care systems remain unclear. The lack of a standardized process of clinical evaluation has resulted in significant variation in approaches to OSC validation and evaluation. The aim of this paper is to characterize a set of congruent requirements for a standardized vignette-based clinical evaluation process of OSCs. Discrepancies in the findings of comparative studies to date suggest that different steps in OSC evaluation methodology can significantly influence outcomes. A standardized process with a clear specification for vignette-based clinical evaluation is urgently needed to guide developers and facilitate the objective comparison of OSCs. We propose 15 recommendation requirements for an OSC evaluation standard. A third-party evaluation process and protocols for prospective real-world evidence studies should also be prioritized to quality assure OSC assessment.
Collapse
|
43
|
Sampietro-Colom L, Fernandez-Barcelo C, Abbas I, Valdasquin B, Rabasseda N, García-Lorenzo B, Sanchez M, Sans M, Garcia N, Granados A. WtsWrng Interim Comparative Effectiveness Evaluation and Description of the Challenges to Develop, Assess, and Introduce This Novel Digital Application in a Traditional Health System. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:13873. [PMID: 36360756 PMCID: PMC9654177 DOI: 10.3390/ijerph192113873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/21/2022] [Accepted: 10/22/2022] [Indexed: 06/16/2023]
Abstract
Science and technology have evolved quickly during the two decades of the 21st century, but healthcare systems are grounded in last century's structure and processes. Changes in the way health care is provided are demanded; digital transformation is a key driver making healthcare systems more accessible, agile, efficient, and citizen-centered. Nevertheless, the way healthcare systems function challenges the development (Innovation + Development and regulatory requirements), assessment (methodological guidance weaknesses), and adoption of digital applications (DAs). WtsWrng (WW), an innovative DA which uses images to interact with citizens for symptom triage and monitoring, is used as an example to show the challenges faced in its development and clinical validation and how these are being overcome. To prove WW's value from inception, novel approaches for evidence generation that allows for an agile and patient-centered development have been applied. Early scientific advice from NICE (UK) was sought for study design, an iterative development and interim analysis was performed, and different statistical parameters (Kappa, B statistic) were explored to face development and assessment challenges. WW triage accuracy at cutoff time ranged from 0.62 to 0.94 for the most frequent symptoms attending the Emergency Department (ED), with the observed concordance for the 12 most frequent diagnostics at hospital discharge fluctuating between 0.4 to 0.97; 8 of the diagnostics had a concordance greater than 0.8. This experience should provoke reflective thinking for DA developers, digital health scientists, regulators, health technology assessors, and payers.
Collapse
Affiliation(s)
- Laura Sampietro-Colom
- Assessment of Innovations and New Technologies Unit, Research and Innovation Directorate, Clínic Barcelona University Hospital, 08036 Barcelona, Spain
- Mangrana Ventures S.L., 08006 Barcelona, Spain
| | - Carla Fernandez-Barcelo
- Assessment of Innovations and New Technologies Unit, Research and Innovation Directorate, Clínic Barcelona University Hospital, 08036 Barcelona, Spain
| | - Ismail Abbas
- Assessment of Innovations and New Technologies Unit, Research and Innovation Directorate, Clínic Barcelona University Hospital, 08036 Barcelona, Spain
| | - Blanca Valdasquin
- Assessment of Innovations and New Technologies Unit, Research and Innovation Directorate, Clínic Barcelona University Hospital, 08036 Barcelona, Spain
| | | | - Borja García-Lorenzo
- Assessment of Innovations and New Technologies Unit, Research and Innovation Directorate, Clínic Barcelona University Hospital, 08036 Barcelona, Spain
- Kronikgune Institute for Health Sciences Research, 48902 Barakaldo, Spain
| | - Miquel Sanchez
- Emergency Department, Clínic Barcelona University Hospital, 08036 Barcelona, Spain
| | - Mireia Sans
- CAP Comte Borrell, Consorci Atenció Primaria Salut Barcelona Esquerra—CAPSBE, 08029 Barcelona, Spain
- Health 2.0 Section of the Col·Legi Oficial de Metges de Barcelona, 08017 Barcelona, Spain
| | - Noemi Garcia
- CAP Comte Borrell, Consorci Atenció Primaria Salut Barcelona Esquerra—CAPSBE, 08029 Barcelona, Spain
| | | |
Collapse
|
44
|
Talukder AK, Schriml L, Ghosh A, Biswas R, Chakrabarti P, Haas RE. Diseasomics: Actionable machine interpretable disease knowledge at the point-of-care. PLOS DIGITAL HEALTH 2022; 1:e0000128. [PMID: 36812614 PMCID: PMC9931276 DOI: 10.1371/journal.pdig.0000128] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 09/14/2022] [Indexed: 11/06/2022]
Abstract
Physicians establish diagnosis by assessing a patient's signs, symptoms, age, sex, laboratory test findings and the disease history. All this must be done in limited time and against the backdrop of an increasing overall workload. In the era of evidence-based medicine it is utmost important for a clinician to be abreast of the latest guidelines and treatment protocols which are changing rapidly. In resource limited settings, the updated knowledge often does not reach the point-of-care. This paper presents an artificial intelligence (AI)-based approach for integrating comprehensive disease knowledge, to support physicians and healthcare workers in arriving at accurate diagnoses at the point-of-care. We integrated different disease-related knowledge bodies to construct a comprehensive, machine interpretable diseasomics knowledge-graph that includes the Disease Ontology, disease symptoms, SNOMED CT, DisGeNET, and PharmGKB data. The resulting disease-symptom network comprises knowledge from the Symptom Ontology, electronic health records (EHR), human symptom disease network, Disease Ontology, Wikipedia, PubMed, textbooks, and symptomology knowledge sources with 84.56% accuracy. We also integrated spatial and temporal comorbidity knowledge obtained from EHR for two population data sets from Spain and Sweden respectively. The knowledge graph is stored in a graph database as a digital twin of the disease knowledge. We use node2vec (node embedding) as digital triplet for link prediction in disease-symptom networks to identify missing associations. This diseasomics knowledge graph is expected to democratize the medical knowledge and empower non-specialist health workers to make evidence based informed decisions and help achieve the goal of universal health coverage (UHC). The machine interpretable knowledge graphs presented in this paper are associations between various entities and do not imply causation. Our differential diagnostic tool focusses on signs and symptoms and does not include a complete assessment of patient's lifestyle and health history which would typically be necessary to rule out conditions and to arrive at a final diagnosis. The predicted diseases are ordered according to the specific disease burden in South Asia. The knowledge graphs and the tools presented here can be used as a guide.
Collapse
Affiliation(s)
- Asoke K. Talukder
- SRIT India, Bangalore, India
- Computer Science & Engineering, National Institute of Technology Karnataka (NITK), Surathkal, India
| | - Lynn Schriml
- University of Maryland School of Medicine, Maryland, United States of America
| | - Arnab Ghosh
- Indian Institute of Technology Bombay, Mumbai, India
| | - Rakesh Biswas
- Kamineni Institute of Medical Sciences, Narketpalle, Telangana, India
| | - Prantar Chakrabarti
- Vivekananda Institute of Medical Sciences, Kolkata, India
- Cybernetic Care, Bangalore, India
| | - Roland E. Haas
- International Institute of Information Technology Bangalore (IIIT-B), Bangalore, India
| |
Collapse
|
45
|
Napierala H, Kopka M, Altendorf MB, Bolanaki M, Schmidt K, Piper SK, Heintze C, Möckel M, Balzer F, Slagman A, Schmieding ML. Examining the impact of a symptom assessment application on patient-physician interaction among self-referred walk-in patients in the emergency department (AKUSYM): study protocol for a multi-center, randomized controlled, parallel-group superiority trial. Trials 2022; 23:791. [PMID: 36127742 PMCID: PMC9490986 DOI: 10.1186/s13063-022-06688-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/24/2022] [Indexed: 11/10/2022] Open
Abstract
Background Due to the increasing use of online health information, symptom checkers have been developed to provide an individualized assessment of health complaints and provide potential diagnoses and an urgency estimation. It is assumed that they support patient empowerment and have a positive impact on patient-physician interaction and satisfaction with care. Particularly in the emergency department (ED), symptom checkers could be integrated to bridge waiting times in the ED, and patients as well as physicians could take advantage of potential positive effects. Our study therefore aims to assess the impact of symptom assessment application (SAA) usage compared to no SAA usage on the patient-physician interaction in self-referred walk-in patients in the ED population. Methods In this multi-center, 1:1 randomized, controlled, parallel-group superiority trial, 440 self-referred adult walk-in patients with a non-urgent triage category will be recruited in three EDs in Berlin. Eligible participants in the intervention group will use a SAA directly after initial triage. The control group receives standard care without using a SAA. The primary endpoint is patients’ satisfaction with the patient-physician interaction assessed by the Patient Satisfaction Questionnaire. Discussion The results of this trial could influence the implementation of SAA into acute care to improve the satisfaction with the patient-physician interaction. Trial registration German Clinical Trials Registry DRKS00028598. Registered on 25.03.2022
Collapse
Affiliation(s)
- Hendrik Napierala
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of General Practice and Family Medicine, Charitéplatz 1, 10117, Berlin, Germany
| | - Marvin Kopka
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117, Berlin, Germany.,Cognitive Psychology and Ergonomics, Department of Psychology and Ergonomics (IPA), Technische Universität Berlin, Straße des 17. Juni 135, 10623, Berlin, Germany
| | - Maria B Altendorf
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Emergency and Acute Medicine and Health Services Research in Emergency Medicine (CVK, CCM), Charitéplatz 1, 10117, Berlin, Germany
| | - Myrto Bolanaki
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Emergency and Acute Medicine and Health Services Research in Emergency Medicine (CVK, CCM), Charitéplatz 1, 10117, Berlin, Germany
| | - Konrad Schmidt
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of General Practice and Family Medicine, Charitéplatz 1, 10117, Berlin, Germany.,Jena University Hospital, Institute of General Practice and Family Medicine, Bachstr. 18, 07743, Jena, Germany
| | - Sophie K Piper
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117, Berlin, Germany.,Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany.,Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
| | - Christoph Heintze
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of General Practice and Family Medicine, Charitéplatz 1, 10117, Berlin, Germany
| | - Martin Möckel
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Emergency and Acute Medicine and Health Services Research in Emergency Medicine (CVK, CCM), Charitéplatz 1, 10117, Berlin, Germany
| | - Felix Balzer
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117, Berlin, Germany
| | - Anna Slagman
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Emergency and Acute Medicine and Health Services Research in Emergency Medicine (CVK, CCM), Charitéplatz 1, 10117, Berlin, Germany
| | - Malte L Schmieding
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117, Berlin, Germany. .,docport Services GmbH, Tußmannstr. 75, 40477, Düsseldorf, Germany.
| |
Collapse
|
46
|
Fraser HSF, Cohan G, Koehler C, Anderson J, Lawrence A, Pateña J, Bacher I, Ranney ML. Evaluation of Diagnostic and Triage Accuracy and Usability of a Symptom Checker in an Emergency Department: Observational Study. JMIR Mhealth Uhealth 2022; 10:e38364. [PMID: 36121688 PMCID: PMC9531004 DOI: 10.2196/38364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 05/31/2022] [Accepted: 06/10/2022] [Indexed: 11/26/2022] Open
Abstract
Background Symptom checkers are clinical decision support apps for patients, used by tens of millions of people annually. They are designed to provide diagnostic and triage advice and assist users in seeking the appropriate level of care. Little evidence is available regarding their diagnostic and triage accuracy with direct use by patients for urgent conditions. Objective The aim of this study is to determine the diagnostic and triage accuracy and usability of a symptom checker in use by patients presenting to an emergency department (ED). Methods We recruited a convenience sample of English-speaking patients presenting for care in an urban ED. Each consenting patient used a leading symptom checker from Ada Health before the ED evaluation. Diagnostic accuracy was evaluated by comparing the symptom checker’s diagnoses and those of 3 independent emergency physicians viewing the patient-entered symptom data, with the final diagnoses from the ED evaluation. The Ada diagnoses and triage were also critiqued by the independent physicians. The patients completed a usability survey based on the Technology Acceptance Model. Results A total of 40 (80%) of the 50 participants approached completed the symptom checker assessment and usability survey. Their mean age was 39.3 (SD 15.9; range 18-76) years, and they were 65% (26/40) female, 68% (27/40) White, 48% (19/40) Hispanic or Latino, and 13% (5/40) Black or African American. Some cases had missing data or a lack of a clear ED diagnosis; 75% (30/40) were included in the analysis of diagnosis, and 93% (37/40) for triage. The sensitivity for at least one of the final ED diagnoses by Ada (based on its top 5 diagnoses) was 70% (95% CI 54%-86%), close to the mean sensitivity for the 3 physicians (on their top 3 diagnoses) of 68.9%. The physicians rated the Ada triage decisions as 62% (23/37) fully agree and 24% (9/37) safe but too cautious. It was rated as unsafe and too risky in 22% (8/37) of cases by at least one physician, in 14% (5/37) of cases by at least two physicians, and in 5% (2/37) of cases by all 3 physicians. Usability was rated highly; participants agreed or strongly agreed with the 7 Technology Acceptance Model usability questions with a mean score of 84.6%, although “satisfaction” and “enjoyment” were rated low. Conclusions This study provides preliminary evidence that a symptom checker can provide acceptable usability and diagnostic accuracy for patients with various urgent conditions. A total of 14% (5/37) of symptom checker triage recommendations were deemed unsafe and too risky by at least two physicians based on the symptoms recorded, similar to the results of studies on telephone and nurse triage. Larger studies are needed of diagnosis and triage performance with direct patient use in different clinical environments.
Collapse
Affiliation(s)
- Hamish S F Fraser
- Brown Center for Biomedical Informatics, Warren Alpert Medical School, Brown University, Providence, RI, United States
- School of Public Health, Brown University, Providence, RI, United States
| | - Gregory Cohan
- Warren Alpert Medical School, Brown University, Providence, RI, United States
| | - Christopher Koehler
- Department of Emergency Medicine, Brown University, Providence, RI, United States
| | - Jared Anderson
- Department of Emergency Medicine, Brown University, Providence, RI, United States
| | - Alexis Lawrence
- Harvard Medical Faculty Physicians, Department of Emergency Medicine, St Luke's Hospital, New Bedford, MA, United States
| | - John Pateña
- Brown-Lifespan Center for Digital Health, Providence, RI, United States
| | - Ian Bacher
- Brown Center for Biomedical Informatics, Warren Alpert Medical School, Brown University, Providence, RI, United States
| | - Megan L Ranney
- School of Public Health, Brown University, Providence, RI, United States
- Department of Emergency Medicine, Brown University, Providence, RI, United States
- Brown-Lifespan Center for Digital Health, Providence, RI, United States
| |
Collapse
|
47
|
Patel R, Swanton AR, Gross MS. Online Symptom Checkers are Poor Tools for Diagnosing Men's Health Conditions. Urology 2022; 170:124-131. [PMID: 36115428 DOI: 10.1016/j.urology.2022.08.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 07/24/2022] [Accepted: 08/02/2022] [Indexed: 10/14/2022]
Abstract
OBJECTIVE To analyze the accuracy of the four most commonly used online symptom checkers (OSCs) in diagnosing erectile dysfunction (ED), scrotal pain (SP), Peyronie's disease (PD), and low testosterone (LT). METHODS AND OUTCOMES One-hundred and sixty artificial vignettes were created by de-identifying recent initial outpatient consults presenting to discuss ED (40), SP (40), PD (40), and LT (40). The vignettes were entered into the 4 most frequently used OSCs (WebMD, MedicineNet, EverydayHealth, and SutterHealth) as determined by web traffic analysis tools. The top 5 conditions listed in the OSC differential diagnosis were recorded and scored. RESULTS WebMD's accuracy for ED, SP, PD, and LT vignettes was 0%, 22.5%, 0%, and 95%, respectively. EverydayHealth was only able to diagnose SP 20% of the time, and failed to diagnose ED, PD, or LT on all occasions. MedicineNet diagnosed ED, PD, SP, and LT in 100%, 98%, 27.5%, and 0% of vignettes, respectively. SutterHealth correctly diagnosed ED, SP, and LT in 100%, 20%, and 80% of patients, respectively. Cumulatively, the OSCs were most accurate in diagnosing ED and least accurate in diagnosing SP when using the Top 1 (37.5% vs. 6.9%) and Top 5 (50% vs. 24.5%) of the suggested conditions. CONCLUSIONS No OSC could accurately diagnose all the conditions tested. The OSCs, on average, were poor at suggesting precise diagnoses for ED, PD, LT, SP. Patients and practitioners should be cautioned regarding the accuracy of OSCs.
Collapse
Affiliation(s)
- Rutul Patel
- New York Institute of Technology College of Osteopathic Medicine, Old Westbury, NY, USA
| | | | | |
Collapse
|
48
|
Gräf M, Knitza J, Leipe J, Krusche M, Welcker M, Kuhn S, Mucke J, Hueber AJ, Hornig J, Klemm P, Kleinert S, Aries P, Vuillerme N, Simon D, Kleyer A, Schett G, Callhoff J. Comparison of physician and artificial intelligence-based symptom checker diagnostic accuracy. Rheumatol Int 2022; 42:2167-2176. [PMID: 36087130 PMCID: PMC9548469 DOI: 10.1007/s00296-022-05202-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 08/29/2022] [Indexed: 11/29/2022]
Abstract
Symptom checkers are increasingly used to assess new symptoms and navigate the health care system. The aim of this study was to compare the accuracy of an artificial intelligence (AI)-based symptom checker (Ada) and physicians regarding the presence/absence of an inflammatory rheumatic disease (IRD). In this survey study, German-speaking physicians with prior rheumatology working experience were asked to determine IRD presence/absence and suggest diagnoses for 20 different real-world patient vignettes, which included only basic health and symptom-related medical history. IRD detection rate and suggested diagnoses of participants and Ada were compared to the gold standard, the final rheumatologists’ diagnosis, reported on the discharge summary report. A total of 132 vignettes were completed by 33 physicians (mean rheumatology working experience 8.8 (SD 7.1) years). Ada’s diagnostic accuracy (IRD) was significantly higher compared to physicians (70 vs 54%, p = 0.002) according to top diagnosis. Ada listed the correct diagnosis more often compared to physicians (54 vs 32%, p < 0.001) as top diagnosis as well as among the top 3 diagnoses (59 vs 42%, p < 0.001). Work experience was not related to suggesting the correct diagnosis or IRD status. Confined to basic health and symptom-related medical history, the diagnostic accuracy of physicians was lower compared to an AI-based symptom checker. These results highlight the potential of using symptom checkers early during the patient journey and importance of access to complete and sufficient patient information to establish a correct diagnosis.
Collapse
Affiliation(s)
- Markus Gräf
- Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.,Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany
| | - Johannes Knitza
- Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany. .,Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany. .,Université Grenoble Alpes, AGEIS, Grenoble, France.
| | - Jan Leipe
- Division of Rheumatology, Department of Medicine V, Medical Faculty Mannheim of the University, University Hospital Mannheim, Heidelberg, Germany
| | - Martin Krusche
- Division of Rheumatology and Systemic Inflammatory Diseases, University Hospital Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Martin Welcker
- Medizinisches Versorgungszentrum Für Rheumatologie Dr. M. Welcker GmbH, Planegg, Germany
| | - Sebastian Kuhn
- Department of Digital Medicine, Medical Faculty OWL, Bielefeld University, Bielefeld, Germany
| | - Johanna Mucke
- Policlinic and Hiller Research Unit for Rheumatology, Medical Faculty, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Axel J Hueber
- Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.,Division of Rheumatology, Klinikum Nürnberg, Paracelsus Medical University, Nuremberg, Germany
| | | | - Philipp Klemm
- Department of Rheumatology, Immunology, Osteology and Physical Medicine, Justus Liebig University Gießen, Campus Kerckhoff, Bad Nauheim, Germany
| | - Stefan Kleinert
- Praxisgemeinschaft Rheumatologie-Nephrologie, Erlangen, Germany
| | | | - Nicolas Vuillerme
- Université Grenoble Alpes, AGEIS, Grenoble, France.,Institut Universitaire de France, Paris, France.,LabCom Telecom4Health, Orange Labs & Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP-UGA, Grenoble, France
| | - David Simon
- Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.,Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany
| | - Arnd Kleyer
- Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.,Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany
| | - Georg Schett
- Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.,Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany
| | - Johanna Callhoff
- Epidemiology Unit, German Rheumatism Research Centre, Berlin, Germany.,Institute for Social Medicine, Epidemiology and Health Economics, Charité Universitätsmedizin, Berlin, Germany
| |
Collapse
|
49
|
Wallace W, Chan C, Chidambaram S, Hanna L, Iqbal FM, Acharya A, Normahani P, Ashrafian H, Markar SR, Sounderajah V, Darzi A. The diagnostic and triage accuracy of digital and online symptom checker tools: a systematic review. NPJ Digit Med 2022; 5:118. [PMID: 35977992 PMCID: PMC9385087 DOI: 10.1038/s41746-022-00667-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 07/25/2022] [Indexed: 11/09/2022] Open
Abstract
Digital and online symptom checkers are an increasingly adopted class of health technologies that enable patients to input their symptoms and biodata to produce a set of likely diagnoses and associated triage advice. However, concerns regarding the accuracy and safety of these symptom checkers have been raised. This systematic review evaluates the accuracy of symptom checkers in providing diagnoses and appropriate triage advice. MEDLINE and Web of Science were searched for studies that used either real or simulated patients to evaluate online or digital symptom checkers. The primary outcomes were the diagnostic and triage accuracy of the symptom checkers. The QUADAS-2 tool was used to assess study quality. Of the 177 studies retrieved, 10 studies met the inclusion criteria. Researchers evaluated the accuracy of symptom checkers using a variety of medical conditions, including ophthalmological conditions, inflammatory arthritides and HIV. A total of 50% of the studies recruited real patients, while the remainder used simulated cases. The diagnostic accuracy of the primary diagnosis was low across included studies (range: 19–37.9%) and varied between individual symptom checkers, despite consistent symptom data input. Triage accuracy (range: 48.8–90.1%) was typically higher than diagnostic accuracy. Overall, the diagnostic and triage accuracy of symptom checkers are variable and of low accuracy. Given the increasing push towards adopting this class of technologies across numerous health systems, this study demonstrates that reliance upon symptom checkers could pose significant patient safety hazards. Large-scale primary studies, based upon real-world data, are warranted to demonstrate the adequate performance of these technologies in a manner that is non-inferior to current best practices. Moreover, an urgent assessment of how these systems are regulated and implemented is required.
Collapse
Affiliation(s)
- William Wallace
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK
| | - Calvin Chan
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK
| | - Swathikan Chidambaram
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK
| | - Lydia Hanna
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK
| | - Fahad Mujtaba Iqbal
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK.,Institute of Global Health Innovation, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
| | - Amish Acharya
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK.,Institute of Global Health Innovation, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
| | - Pasha Normahani
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK
| | - Hutan Ashrafian
- Institute of Global Health Innovation, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
| | - Sheraz R Markar
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Nuffield Department of Surgery, Churchill Hospital, University of Oxford, OX3 7LE, Oxford, UK
| | - Viknesh Sounderajah
- Institute of Global Health Innovation, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK.
| | - Ara Darzi
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK.,Institute of Global Health Innovation, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
| |
Collapse
|
50
|
Liu VDM, Kaila M, Koskela T. User initiated symptom assessment with an electronic symptom checker. Study protocol for mixed-methods validation. (Preprint). JMIR Res Protoc 2022. [PMID: 37467041 PMCID: PMC10398552 DOI: 10.2196/41423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023] Open
Abstract
BACKGROUND The national Omaolo digital social welfare and health care service of Finland provides a symptom checker, Omaolo, which is a medical device (based on Duodecim Clinical Decision Support EBMEDS software) with a CE marking (risk class IIa), manufactured by the government-owned DigiFinland Oy. Users of this service can perform their triage by using the questions in the symptom checker. By completing the symptom checker, the user receives a recommendation for action and a service assessment with appropriate guidance regarding their health problems on the basis of a selected specific symptom in the symptom checker. This allows users to be provided with appropriate health care services, regardless of time and place. OBJECTIVE This study describes the protocol for the mixed methods validation process of the symptom checker available in Omaolo digital services. METHODS This is a mixed methods study using quantitative and qualitative methods, which will be part of the clinical validation process that takes place in primary health care centers in Finland. Each organization provides a space where the study and the nurse triage can be done in order to include an unscreened target population of users. The primary health care units provide walk-in model services, where no prior phone call or contact is required. For the validation of the Omaolo symptom checker, case vignettes will be incorporated to supplement the triage accuracy of rare and acute cases that cannot be tested extensively in real-life settings. Vignettes are produced from a variety of clinical sources, and they test the symptom checker in different triage levels by using 1 standardized patient case example. RESULTS This study plan underwent an ethics review by the regional permission, which was requested from each organization participating in the research, and an ethics committee statement was requested and granted from Pirkanmaa hospital district's ethics committee, which is in accordance with the University of Tampere's regulations. Of 964 clinical user-filled symptom checker assessments, 877 cases were fully completed with a triage result, and therefore, they met the requirements for clinical validation studies. The goal for sufficient data has been reached for most of the chief symptoms. Data collection was completed in September 2019, and the first feasibility and patient experience results were published by the end of 2020. Case vignettes have been identified and are to be completed before further testing the symptom checker. The analysis and reporting are estimated to be finalized in 2024. CONCLUSIONS The primary goals of this multimethod electronic symptom checker study are to assess safety and to provide crucial information regarding the accuracy and usability of the Omaolo electronic symptom checker. To our knowledge, this will be the first study to include real-life clinical cases along with case vignettes. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/41423.
Collapse
|