Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Semigran HL, Linder JA, Gidengil C, Mehrotra A. Evaluation of symptom checkers for self diagnosis and triage: audit study. BMJ 2015;351:h3480. [PMID: 26157077 PMCID: PMC4496786 DOI: 10.1136/bmj.h3480] [Citation(s) in RCA: 195] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/15/2015] [Indexed: 01/17/2023]

For:	Semigran HL, Linder JA, Gidengil C, Mehrotra A. Evaluation of symptom checkers for self diagnosis and triage: audit study. BMJ 2015;351:h3480. [PMID: 26157077 PMCID: PMC4496786 DOI: 10.1136/bmj.h3480] [Citation(s) in RCA: 195] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/15/2015] [Indexed: 01/17/2023]

Number

Cited by Other Article(s)

Petrella RJ. The AI Future of Emergency Medicine. Ann Emerg Med 2024;84:139-153. [PMID: 38795081 DOI: 10.1016/j.annemergmed.2024.01.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/23/2024] [Accepted: 01/24/2024] [Indexed: 05/27/2024]

Kachman MM, Brennan I, Oskvarek JJ, Waseem T, Pines JM. How artificial intelligence could transform emergency care. Am J Emerg Med 2024;81:40-46. [PMID: 38663302 DOI: 10.1016/j.ajem.2024.04.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/13/2024] [Accepted: 04/15/2024] [Indexed: 06/07/2024] Open

Rutledge GW. Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases. Learn Health Syst 2024;8:e10438. [PMID: 39036534 PMCID: PMC11257049 DOI: 10.1002/lrh2.10438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/16/2024] [Accepted: 05/19/2024] [Indexed: 07/23/2024] Open

Codipilly DC, Faghani S, Hagan C, Lewis J, Erickson BJ, Iyer PG. The Evolving Role of Artificial Intelligence in Gastrointestinal Histopathology: An Update. Clin Gastroenterol Hepatol 2024;22:1170-1180. [PMID: 38154727 DOI: 10.1016/j.cgh.2023.11.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/20/2023] [Accepted: 11/21/2023] [Indexed: 12/30/2023]

Meczner A, Cohen N, Qureshi A, Reza M, Sutaria S, Blount E, Bagyura Z, Malak T. Controlling Inputter Variability in Vignette Studies Assessing Web-Based Symptom Checkers: Evaluation of Current Practice and Recommendations for Isolated Accuracy Metrics. JMIR Form Res 2024;8:e49907. [PMID: 38820578 PMCID: PMC11179013 DOI: 10.2196/49907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/10/2023] [Accepted: 04/24/2024] [Indexed: 06/02/2024] Open

Abstract

BACKGROUND

The rapid growth of web-based symptom checkers (SCs) is not matched by advances in quality assurance. Currently, there are no widely accepted criteria assessing SCs' performance. Vignette studies are widely used to evaluate SCs, measuring the accuracy of outcome. Accuracy behaves as a composite metric as it is affected by a number of individual SC- and tester-dependent factors. In contrast to clinical studies, vignette studies have a small number of testers. Hence, measuring accuracy alone in vignette studies may not provide a reliable assessment of performance due to tester variability.

OBJECTIVE

This study aims to investigate the impact of tester variability on the accuracy of outcome of SCs, using clinical vignettes. It further aims to investigate the feasibility of measuring isolated aspects of performance.

METHODS

Healthily's SC was assessed using 114 vignettes by 3 groups of 3 testers who processed vignettes with different instructions: free interpretation of vignettes (free testers), specified chief complaints (partially free testers), and specified chief complaints with strict instruction for answering additional symptoms (restricted testers). κ statistics were calculated to assess agreement of top outcome condition and recommended triage. Crude and adjusted accuracy was measured against a gold standard. Adjusted accuracy was calculated using only results of consultations identical to the vignette, following a review and selection process. A feasibility study for assessing symptom comprehension of SCs was performed using different variations of 51 chief complaints across 3 SCs.

RESULTS

Intertester agreement of most likely condition and triage was, respectively, 0.49 and 0.51 for the free tester group, 0.66 and 0.66 for the partially free group, and 0.72 and 0.71 for the restricted group. For the restricted group, accuracy ranged from 43.9% to 57% for individual testers, averaging 50.6% (SD 5.35%). Adjusted accuracy was 56.1%. Assessing symptom comprehension was feasible for all 3 SCs. Comprehension scores ranged from 52.9% and 68%.

CONCLUSIONS

We demonstrated that by improving standardization of the vignette testing process, there is a significant improvement in the agreement of outcome between testers. However, significant variability remained due to uncontrollable tester-dependent factors, reflected by varying outcome accuracy. Tester-dependent factors, combined with a small number of testers, limit the reliability and generalizability of outcome accuracy when used as a composite measure in vignette studies. Measuring and reporting different aspects of SC performance in isolation provides a more reliable assessment of SC performance. We developed an adjusted accuracy measure using a review and selection process to assess data algorithm quality. In addition, we demonstrated that symptom comprehension with different input methods can be feasibly compared. Future studies reporting accuracy need to apply vignette testing standardization and isolated metrics.

Collapse

Augusto Duenhas Accorsi T, Tocci Moreira F, Aires Eduardo A, Albaladejo Morbeck R, Francine Köhler K, De Amicis Lima K, Henrique Sartorato Pedrotti C. Outcome After Self-Triage App Referral in Urgent Direct-to-Consumer Telemedicine Encounter. Telemed J E Health 2024. [PMID: 38805348 DOI: 10.1089/tmj.2024.0126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2024] Open

Abstract

Background: The quantification of self-triage effectiveness, guided by mobile applications, in urgent direct-to-consumer telemedicine (TM) encounters requires further investigation. The objective of this study was to evaluate the outcomes of referral guidance provided by a symptom-based self-management mobile application decision algorithm in the context of remote urgent care assessments. Methods: An observational retrospective single-center study was conducted from May 2022 to December 2023. The inclusion criteria encompassed individuals aged >18 years old, and those spontaneously seeking virtual emergency care through the EINSTEIN CONECTA application. Patients experiencing connectivity issues, preventing completion of the encounter, were excluded. The primary outcomes included the rate of patient concurrence with the algorithm's recommendation for seeking in-person emergency care and the referral rate to face-to-face assessment among cases evaluated through TM. The application's algorithm employs scientific evidence based on symptoms to recommend referrals to emergency departments (EDs). Results: Out of 88,834 patients connected to the TM Center, self-triage obviated the need for virtual physician assessment in 53,302 (60%) encounters. A total of 35,532 patients were remotely evaluated by 316 on-duty physicians, resulting in 1,125 ICD-coded diagnoses. Among these, 21,722 (61.1%) were initially advised by self-triage to visit the ED, with subsequent medical assessment leading to in-person referrals in 6,354 (29.3%) of the evaluations. Of the 13,810 patients recommended to continue with virtual care post-self-triage, 157 (1.1%) were referred for in-person assessment. Conclusions: Self-triage effectively reduced the need for physician encounters in approximately three-fifths of TM consultations. Despite being based on scientific evidence, symptom-based referral algorithms demonstrated high sensitivity but poor correlation with physician decision-making.

Collapse

Hammoud M, Douglas S, Darmach M, Alawneh S, Sanyal S, Kanbour Y. Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study. JMIR AI 2024;3:e46875. [PMID: 38875676 PMCID: PMC11091811 DOI: 10.2196/46875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/15/2023] [Accepted: 03/02/2024] [Indexed: 06/16/2024]

Abstract

BACKGROUND

Medical self-diagnostic tools (or symptom checkers) are becoming an integral part of digital health and our daily lives, whereby patients are increasingly using them to identify the underlying causes of their symptoms. As such, it is essential to rigorously investigate and comprehensively report the diagnostic performance of symptom checkers using standard clinical and scientific approaches.

OBJECTIVE

This study aims to evaluate and report the accuracies of a few known and new symptom checkers using a standard and transparent methodology, which allows the scientific community to cross-validate and reproduce the reported results, a step much needed in health informatics.

METHODS

We propose a 4-stage experimentation methodology that capitalizes on the standard clinical vignette approach to evaluate 6 symptom checkers. To this end, we developed and peer-reviewed 400 vignettes, each approved by at least 5 out of 7 independent and experienced primary care physicians. To establish a frame of reference and interpret the results of symptom checkers accordingly, we further compared the best-performing symptom checker against 3 primary care physicians with an average experience of 16.6 (SD 9.42) years. To measure accuracy, we used 7 standard metrics, including M1 as a measure of a symptom checker's or a physician's ability to return a vignette's main diagnosis at the top of their differential list, F1-score as a trade-off measure between recall and precision, and Normalized Discounted Cumulative Gain (NDCG) as a measure of a differential list's ranking quality, among others.

RESULTS

The diagnostic accuracies of the 6 tested symptom checkers vary significantly. For instance, the differences in the M1, F1-score, and NDCG results between the best-performing and worst-performing symptom checkers or ranges were 65.3%, 39.2%, and 74.2%, respectively. The same was observed among the participating human physicians, whereby the M1, F1-score, and NDCG ranges were 22.8%, 15.3%, and 21.3%, respectively. When compared against each other, physicians outperformed the best-performing symptom checker by an average of 1.2% using F1-score, whereas the best-performing symptom checker outperformed physicians by averages of 10.2% and 25.1% using M1 and NDCG, respectively.

CONCLUSIONS

The performance variation between symptom checkers is substantial, suggesting that symptom checkers cannot be treated as a single entity. On a different note, the best-performing symptom checker was an artificial intelligence (AI)-based one, shedding light on the promise of AI in improving the diagnostic capabilities of symptom checkers, especially as AI keeps advancing exponentially.

Collapse

Fukuzawa F, Yanagita Y, Yokokawa D, Uchida S, Yamashita S, Li Y, Shikino K, Tsukamoto T, Noda K, Uehara T, Ikusaka M. Importance of Patient History in Artificial Intelligence-Assisted Medical Diagnosis: Comparison Study. JMIR MEDICAL EDUCATION 2024;10:e52674. [PMID: 38602313 PMCID: PMC11024399 DOI: 10.2196/52674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/31/2024] [Accepted: 02/15/2024] [Indexed: 04/12/2024]

Savolainen K, Kujala S. Testing Two Online Symptom Checkers With Vulnerable Groups: Usability Study to Improve Cognitive Accessibility of eHealth Services. JMIR Hum Factors 2024;11:e45275. [PMID: 38457214 PMCID: PMC10960212 DOI: 10.2196/45275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/08/2023] [Accepted: 02/03/2024] [Indexed: 03/09/2024] Open

Abstract

BACKGROUND

The popularity of eHealth services has surged significantly, underscoring the importance of ensuring their usability and accessibility for users with diverse needs, characteristics, and capabilities. These services can pose cognitive demands, especially for individuals who are unwell, fatigued, or experiencing distress. Additionally, numerous potentially vulnerable groups, including older adults, are susceptible to digital exclusion and may encounter cognitive limitations related to perception, attention, memory, and language comprehension. Regrettably, many studies overlook the preferences and needs of user groups likely to encounter challenges associated with these cognitive aspects.

OBJECTIVE

This study primarily aims to gain a deeper understanding of cognitive accessibility in the practical context of eHealth services. Additionally, we aimed to identify the specific challenges that vulnerable groups encounter when using eHealth services and determine key considerations for testing these services with such groups.

METHODS

As a case study of eHealth services, we conducted qualitative usability testing on 2 online symptom checkers used in Finnish public primary care. A total of 13 participants from 3 distinct groups participated in the study: older adults, individuals with mild intellectual disabilities, and nonnative Finnish speakers. The primary research methods used were the thinking-aloud method, questionnaires, and semistructured interviews.

RESULTS

We found that potentially vulnerable groups encountered numerous issues with the tested services, with similar problems observed across all 3 groups. Specifically, clarity and the use of terminology posed significant challenges. The services overwhelmed users with excessive information and choices, while the terminology consisted of numerous complex medical terms that were difficult to understand. When conducting tests with vulnerable groups, it is crucial to carefully plan the sessions to avoid being overly lengthy, as these users often require more time to complete tasks. Additionally, testing with vulnerable groups proved to be quite efficient, with results likely to benefit a wider audience as well.

CONCLUSIONS

Based on the findings of this study, it is evident that older adults, individuals with mild intellectual disability, and nonnative speakers may encounter cognitive challenges when using eHealth services, which can impede or slow down their use and make the services more difficult to navigate. In the worst-case scenario, these challenges may lead to errors in using the services. We recommend expanding the scope of testing to include a broader range of eHealth services with vulnerable groups, incorporating users with diverse characteristics and capabilities who are likely to encounter difficulties in cognitive accessibility.

Collapse

Chien S, Miller G, Huang I, Cunningham DA, Carson D, Gall LS, Khan KS. Quality assessment of online patient information on upper gastrointestinal endoscopy using the modified Ensuring Quality Information for Patients tool. Ann R Coll Surg Engl 2024. [PMID: 38376380 DOI: 10.1308/rcsann.2022.0078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024] Open

Abstract

INTRODUCTION

Websites and online resources are increasingly becoming patients' main source of healthcare information. It is paramount that high quality information is available online to enhance patient education and improve clinical outcomes. Upper gastrointestinal (UGI) endoscopy is the gold standard investigation for UGI symptoms and yet little is known regarding the quality of patient orientated websites. The aim of this study was to assess the quality of online patient information on UGI endoscopy using the modified Ensuring Quality Information for Patients (EQIP) tool.

METHODS

Ten search terms were employed to conduct a systematic review. for each term, the top 100 websites identified via a Google search were assessed using the modified EQIP tool. High scoring websites underwent further analysis. Websites intended for professional use by clinicians as well as those containing video or marketing content were excluded.

FINDINGS

A total of 378 websites were eligible for analysis. The median modified EQIP score for UGI endoscopy was 18/36 (interquartile range: 14-21). The median EQIP scores for the content, identification and structure domains were 8/18, 1/6 and 9/12 respectively. Higher modified EQIP scores were obtained for websites produced by government departments and National Health Service hospitals (p=0.007). Complication rates were documented in only a fifth (20.4%) of websites. High scoring websites were significantly more likely to provide balanced information on risks and benefits (94.6% vs 34.4%, p<0.001).

CONCLUSIONS

There is an immediate need to improve the quality of online patient information regarding UGI endoscopy. The currently available resources provide minimal information on the risks associated with the procedure, potentially hindering patients' ability to make informed healthcare decisions.

Collapse

Müller R, Klemmt M, Koch R, Ehni HJ, Henking T, Langmann E, Wiesing U, Ranisch R. "That's just Future Medicine" - a qualitative study on users' experiences of symptom checker apps. BMC Med Ethics 2024;25:17. [PMID: 38365749 PMCID: PMC10874001 DOI: 10.1186/s12910-024-01011-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 02/06/2024] [Indexed: 02/18/2024] Open

Abstract

BACKGROUND

Symptom checker apps (SCAs) are mobile or online applications for lay people that usually have two main functions: symptom analysis and recommendations. SCAs ask users questions about their symptoms via a chatbot, give a list with possible causes, and provide a recommendation, such as seeing a physician. However, it is unclear whether the actual performance of a SCA corresponds to the users' experiences. This qualitative study investigates the subjective perspectives of SCA users to close the empirical gap identified in the literature and answers the following main research question: How do individuals (healthy users and patients) experience the usage of SCA, including their attitudes, expectations, motivations, and concerns regarding their SCA use?

METHODS

A qualitative interview study was chosen to clarify the relatively unknown experience of SCA use. Semi-structured qualitative interviews with SCA users were carried out by two researchers in tandem via video call. Qualitative content analysis was selected as methodology for the data analysis.

RESULTS

Fifteen interviews with SCA users were conducted and seven main categories identified: (1) Attitudes towards findings and recommendations, (2) Communication, (3) Contact with physicians, (4) Expectations (prior to use), (5) Motivations, (6) Risks, and (7) SCA-use for others.

CONCLUSIONS

The aspects identified in the analysis emphasise the specific perspective of SCA users and, at the same time, the immense scope of different experiences. Moreover, the study reveals ethical issues, such as relational aspects, that are often overlooked in debates on mHealth. Both empirical and ethical research is more needed, as the awareness of the subjective experience of those affected is an essential component in the responsible development and implementation of health apps such as SCA.

TRIAL REGISTRATION

German Clinical Trials Register (DRKS): DRKS00022465. 07/08/2020.

Collapse

Xue J, Zhang B, Zhao Y, Zhang Q, Zheng C, Jiang J, Li H, Liu N, Li Z, Fu W, Peng Y, Logan J, Zhang J, Xiang X. Evaluation of the Current State of Chatbots for Digital Health: Scoping Review. J Med Internet Res 2023;25:e47217. [PMID: 38113097 PMCID: PMC10762606 DOI: 10.2196/47217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/15/2023] [Accepted: 11/24/2023] [Indexed: 12/21/2023] Open

Abstract

BACKGROUND

Chatbots have become ubiquitous in our daily lives, enabling natural language conversations with users through various modes of communication. Chatbots have the potential to play a significant role in promoting health and well-being. As the number of studies and available products related to chatbots continues to rise, there is a critical need to assess product features to enhance the design of chatbots that effectively promote health and behavioral change.

OBJECTIVE

This scoping review aims to provide a comprehensive assessment of the current state of health-related chatbots, including the chatbots' characteristics and features, user backgrounds, communication models, relational building capacity, personalization, interaction, responses to suicidal thoughts, and users' in-app experiences during chatbot use. Through this analysis, we seek to identify gaps in the current research, guide future directions, and enhance the design of health-focused chatbots.

METHODS

Following the scoping review methodology by Arksey and O'Malley and guided by the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist, this study used a two-pronged approach to identify relevant chatbots: (1) searching the iOS and Android App Stores and (2) reviewing scientific literature through a search strategy designed by a librarian. Overall, 36 chatbots were selected based on predefined criteria from both sources. These chatbots were systematically evaluated using a comprehensive framework developed for this study, including chatbot characteristics, user backgrounds, building relational capacity, personalization, interaction models, responses to critical situations, and user experiences. Ten coauthors were responsible for downloading and testing the chatbots, coding their features, and evaluating their performance in simulated conversations. The testing of all chatbot apps was limited to their free-to-use features.

RESULTS

This review provides an overview of the diversity of health-related chatbots, encompassing categories such as mental health support, physical activity promotion, and behavior change interventions. Chatbots use text, animations, speech, images, and emojis for communication. The findings highlight variations in conversational capabilities, including empathy, humor, and personalization. Notably, concerns regarding safety, particularly in addressing suicidal thoughts, were evident. Approximately 44% (16/36) of the chatbots effectively addressed suicidal thoughts. User experiences and behavioral outcomes demonstrated the potential of chatbots in health interventions, but evidence remains limited.

CONCLUSIONS

This scoping review underscores the significance of chatbots in health-related applications and offers insights into their features, functionalities, and user experiences. This study contributes to advancing the understanding of chatbots' role in digital health interventions, thus paving the way for more effective and user-centric health promotion strategies. This study informs future research directions, emphasizing the need for rigorous randomized control trials, standardized evaluation metrics, and user-centered design to unlock the full potential of chatbots in enhancing health and well-being. Future research should focus on addressing limitations, exploring real-world user experiences, and implementing robust data security and privacy measures.

Collapse

Benoit JR, Hartling L, Scott SD. Bridging evidence-to-care gaps with mHealth: Designing a symptom checker for parents accessing knowledge translation resources on acute children's illnesses in a smartphone application. PEC INNOVATION 2023;2:100152. [PMID: 37214490 PMCID: PMC10194162 DOI: 10.1016/j.pecinn.2023.100152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 03/06/2023] [Accepted: 03/28/2023] [Indexed: 05/24/2023]

Bushuven S, Bentele M, Bentele S, Gerber B, Bansbach J, Ganter J, Trifunovic-Koenig M, Ranisch R. "ChatGPT, Can You Help Me Save My Child's Life?" - Diagnostic Accuracy and Supportive Capabilities to Lay Rescuers by ChatGPT in Prehospital Basic Life Support and Paediatric Advanced Life Support Cases - An In-silico Analysis. J Med Syst 2023;47:123. [PMID: 37987870 PMCID: PMC10663183 DOI: 10.1007/s10916-023-02019-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 11/13/2023] [Indexed: 11/22/2023]

Abstract

BACKGROUND

Paediatric emergencies are challenging for healthcare workers, first aiders, and parents waiting for emergency medical services to arrive. With the expected rise of virtual assistants, people will likely seek help from such digital AI tools, especially in regions lacking emergency medical services. Large Language Models like ChatGPT proved effective in providing health-related information and are competent in medical exams but are questioned regarding patient safety. Currently, there is no information on ChatGPT's performance in supporting parents in paediatric emergencies requiring help from emergency medical services. This study aimed to test 20 paediatric and two basic life support case vignettes for ChatGPT and GPT-4 performance and safety in children.

METHODS

We provided the cases three times each to two models, ChatGPT and GPT-4, and assessed the diagnostic accuracy, emergency call advice, and the validity of advice given to parents.

RESULTS

Both models recognized the emergency in the cases, except for septic shock and pulmonary embolism, and identified the correct diagnosis in 94%. However, ChatGPT/GPT-4 reliably advised to call emergency services only in 12 of 22 cases (54%), gave correct first aid instructions in 9 cases (45%) and incorrectly advised advanced life support techniques to parents in 3 of 22 cases (13.6%).

CONCLUSION

Considering these results of the recent ChatGPT versions, the validity, reliability and thus safety of ChatGPT/GPT-4 as an emergency support tool is questionable. However, whether humans would perform better in the same situation is uncertain. Moreover, other studies have shown that human emergency call operators are also inaccurate, partly with worse performance than ChatGPT/GPT-4 in our study. However, one of the main limitations of the study is that we used prototypical cases, and the management may differ from urban to rural areas and between different countries, indicating the need for further evaluation of the context sensitivity and adaptability of the model. Nevertheless, ChatGPT and the new versions under development may be promising tools for assisting lay first responders, operators, and professionals in diagnosing a paediatric emergency.

TRIAL REGISTRATION

Not applicable.

Collapse

Ito N, Kadomatsu S, Fujisawa M, Fukaguchi K, Ishizawa R, Kanda N, Kasugai D, Nakajima M, Goto T, Tsugawa Y. The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study. JMIR MEDICAL EDUCATION 2023;9:e47532. [PMID: 37917120 PMCID: PMC10654908 DOI: 10.2196/47532] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 07/07/2023] [Accepted: 09/05/2023] [Indexed: 11/03/2023]

Abstract

BACKGROUND

Whether GPT-4, the conversational artificial intelligence, can accurately diagnose and triage health conditions and whether it presents racial and ethnic biases in its decisions remain unclear.

OBJECTIVE

We aim to assess the accuracy of GPT-4 in the diagnosis and triage of health conditions and whether its performance varies by patient race and ethnicity.

METHODS

We compared the performance of GPT-4 and physicians, using 45 typical clinical vignettes, each with a correct diagnosis and triage level, in February and March 2023. For each of the 45 clinical vignettes, GPT-4 and 3 board-certified physicians provided the most likely primary diagnosis and triage level (emergency, nonemergency, or self-care). Independent reviewers evaluated the diagnoses as "correct" or "incorrect." Physician diagnosis was defined as the consensus of the 3 physicians. We evaluated whether the performance of GPT-4 varies by patient race and ethnicity, by adding the information on patient race and ethnicity to the clinical vignettes.

RESULTS

The accuracy of diagnosis was comparable between GPT-4 and physicians (the percentage of correct diagnosis was 97.8% (44/45; 95% CI 88.2%-99.9%) for GPT-4 and 91.1% (41/45; 95% CI 78.8%-97.5%) for physicians; P=.38). GPT-4 provided appropriate reasoning for 97.8% (44/45) of the vignettes. The appropriateness of triage was comparable between GPT-4 and physicians (GPT-4: 30/45, 66.7%; 95% CI 51.0%-80.0%; physicians: 30/45, 66.7%; 95% CI 51.0%-80.0%; P=.99). The performance of GPT-4 in diagnosing health conditions did not vary among different races and ethnicities (Black, White, Asian, and Hispanic), with an accuracy of 100% (95% CI 78.2%-100%). P values, compared to the GPT-4 output without incorporating race and ethnicity information, were all .99. The accuracy of triage was not significantly different even if patients' race and ethnicity information was added. The accuracy of triage was 62.2% (95% CI 46.5%-76.2%; P=.50) for Black patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for White patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for Asian patients, and 62.2% (95% CI 46.5%-76.2%; P=.69) for Hispanic patients. P values were calculated by comparing the outputs with and without conditioning on race and ethnicity.

CONCLUSIONS

GPT-4's ability to diagnose and triage typical clinical vignettes was comparable to that of board-certified physicians. The performance of GPT-4 did not vary by patient race and ethnicity. These findings should be informative for health systems looking to introduce conversational artificial intelligence to improve the efficiency of patient diagnosis and triage.

Collapse

Marcin T, Lüthi A, Graf RR, Krummrey G, Schauber SK, Breakey N, Hautz WE, Hautz SC. Is language an issue? Accuracy of the German computerized diagnostic decision support system ISABEL and cross-validation with the English counterpart. Diagnosis (Berl) 2023;10:398-405. [PMID: 37480571 DOI: 10.1515/dx-2023-0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 06/16/2023] [Indexed: 07/24/2023]

Fraser H, Crossland D, Bacher I, Ranney M, Madsen T, Hilliard R. Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study. JMIR Mhealth Uhealth 2023;11:e49995. [PMID: 37788063 PMCID: PMC10582809 DOI: 10.2196/49995] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/17/2023] [Accepted: 08/25/2023] [Indexed: 10/04/2023] Open

Abstract

BACKGROUND

Diagnosis is a core component of effective health care, but misdiagnosis is common and can put patients at risk. Diagnostic decision support systems can play a role in improving diagnosis by physicians and other health care workers. Symptom checkers (SCs) have been designed to improve diagnosis and triage (ie, which level of care to seek) by patients.

OBJECTIVE

The aim of this study was to evaluate the performance of the new large language model ChatGPT (versions 3.5 and 4.0), the widely used WebMD SC, and an SC developed by Ada Health in the diagnosis and triage of patients with urgent or emergent clinical problems compared with the final emergency department (ED) diagnoses and physician reviews.

METHODS

We used previously collected, deidentified, self-report data from 40 patients presenting to an ED for care who used the Ada SC to record their symptoms prior to seeing the ED physician. Deidentified data were entered into ChatGPT versions 3.5 and 4.0 and WebMD by a research assistant blinded to diagnoses and triage. Diagnoses from all 4 systems were compared with the previously abstracted final diagnoses in the ED as well as with diagnoses and triage recommendations from three independent board-certified ED physicians who had blindly reviewed the self-report clinical data from Ada. Diagnostic accuracy was calculated as the proportion of the diagnoses from ChatGPT, Ada SC, WebMD SC, and the independent physicians that matched at least one ED diagnosis (stratified as top 1 or top 3). Triage accuracy was calculated as the number of recommendations from ChatGPT, WebMD, or Ada that agreed with at least 2 of the independent physicians or were rated "unsafe" or "too cautious."

RESULTS

Overall, 30 and 37 cases had sufficient data for diagnostic and triage analysis, respectively. The rate of top-1 diagnosis matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 9 (30%), 12 (40%), 10 (33%), and 12 (40%), respectively, with a mean rate of 47% for the physicians. The rate of top-3 diagnostic matches for Ada, ChatGPT 3.5, ChatGPT 4.0, and WebMD was 19 (63%), 19 (63%), 15 (50%), and 17 (57%), respectively, with a mean rate of 69% for physicians. The distribution of triage results for Ada was 62% (n=23) agree, 14% unsafe (n=5), and 24% (n=9) too cautious; that for ChatGPT 3.5 was 59% (n=22) agree, 41% (n=15) unsafe, and 0% (n=0) too cautious; that for ChatGPT 4.0 was 76% (n=28) agree, 22% (n=8) unsafe, and 3% (n=1) too cautious; and that for WebMD was 70% (n=26) agree, 19% (n=7) unsafe, and 11% (n=4) too cautious. The unsafe triage rate for ChatGPT 3.5 (41%) was significantly higher (P=.009) than that of Ada (14%).

CONCLUSIONS

ChatGPT 3.5 had high diagnostic accuracy but a high unsafe triage rate. ChatGPT 4.0 had the poorest diagnostic accuracy, but a lower unsafe triage rate and the highest triage agreement with the physicians. The Ada and WebMD SCs performed better overall than ChatGPT. Unsupervised patient use of ChatGPT for diagnosis and triage is not recommended without improvements to triage accuracy and extensive clinical evaluation.

Collapse

Kuroiwa T, Sarcon A, Ibara T, Yamada E, Yamamoto A, Tsukamoto K, Fujita K. The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study. J Med Internet Res 2023;25:e47621. [PMID: 37713254 PMCID: PMC10541638 DOI: 10.2196/47621] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/17/2023] [Accepted: 08/17/2023] [Indexed: 09/16/2023] Open

Abstract

BACKGROUND

Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT's accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations.

OBJECTIVE

The aim of this study was to evaluate ChatGPT's ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations.

METHODS

Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss κ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study.

RESULTS

The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, -0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases "essential," "recommended," "best," and "important" were used. Specifically, "essential" occurred in 4 out of 125, "recommended" in 12 out of 125, "best" in 6 out of 125, and "important" in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention.

CONCLUSIONS

The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study.

Collapse

Wiedermann CJ, Mahlknecht A, Piccoliori G, Engl A. Redesigning Primary Care: The Emergence of Artificial-Intelligence-Driven Symptom Diagnostic Tools. J Pers Med 2023;13:1379. [PMID: 37763147 PMCID: PMC10532810 DOI: 10.3390/jpm13091379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 09/13/2023] [Accepted: 09/14/2023] [Indexed: 09/29/2023] Open

Abstract

Modern healthcare is facing a juxtaposition of increasing patient demands owing to an aging population and a decreasing general practitioner workforce, leading to strained access to primary care. The coronavirus disease 2019 pandemic has emphasized the potential for alternative consultation methods, highlighting opportunities to minimize unnecessary care. This article discusses the role of artificial-intelligence-driven symptom checkers, particularly their efficiency, utility, and challenges in primary care. Based on a study conducted in Italian general practices, insights from both physicians and patients were gathered regarding this emergent technology, highlighting differences in perceived utility, user satisfaction, and potential challenges. While symptom checkers are seen as potential tools for addressing healthcare challenges, concerns regarding their accuracy and the potential for misdiagnosis persist. Patients generally viewed them positively, valuing their ease of use and the empowerment they provide in managing health. However, some general practitioners perceive these tools as challenges to their expertise. This article proposes that artificial-intelligence-based symptom checkers can optimize medical-history taking for the benefit of both general practitioners and patients, with potential enhancements in complex diagnostic tasks rather than routine diagnoses. It underscores the importance of carefully integrating digital innovations while preserving the essential human touch in healthcare. Symptom checkers offer promising solutions; ensuring their accuracy, reliability, and effective integration into primary care requires rigorous research, clinical guidance, and an understanding of varied user perceptions. Collaboration among technologists, clinicians, and patients is paramount for the successful evolution of digital tools in healthcare.

Collapse

Mahlknecht A, Engl A, Piccoliori G, Wiedermann CJ. Supporting primary care through symptom checking artificial intelligence: a study of patient and physician attitudes in Italian general practice. BMC PRIMARY CARE 2023;24:174. [PMID: 37661285 PMCID: PMC10476397 DOI: 10.1186/s12875-023-02143-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 08/29/2023] [Indexed: 09/05/2023]

Abstract

BACKGROUND

Rapid advancements in artificial intelligence (AI) have led to the adoption of AI-driven symptom checkers in primary care. This study aimed to evaluate both patients' and physicians' attitudes towards these tools in Italian general practice settings, focusing on their perceived utility, user satisfaction, and potential challenges.

METHODS

This feasibility study involved ten general practitioners (GPs) and patients visiting GP offices. The patients used a chatbot-based symptom checker before their medical visit and conducted anamnestic screening for COVID-19 and a medical history algorithm concerning the current medical problem. The entered data were forwarded to the GP as medical history aid. After the medical visit, both physicians and patients evaluated their respective symptoms. Additionally, physicians performed a final overall evaluation of the symptom checker after the conclusion of the practice phase.

RESULTS

Most patients did not use symptom checkers. Overall, 49% of patients and 27% of physicians reported being rather or very satisfied with the symptom checker. The most frequent patient-reported reasons for satisfaction were ease of use, precise and comprehensive questions, perceived time-saving potential, and encouragement of self-reflection. Every other patient would consider at-home use of the symptom checker for the first appraisal of health problems to save time, reduce unnecessary visits, and/or as an aid for the physician. Patients' attitudes towards the symptom checker were not significantly associated with age, sex, or level of education. Most patients (75%) and physicians (84%) indicated that the symptom checker had no effect on the duration of the medical visit. Only a few participants found the use of the symptom checker to be disruptive to the medical visit or its quality.

CONCLUSIONS

The findings suggest a positive reception of the symptom checker, albeit with differing focus between patients and physicians. With the potential to be integrated further into primary care, these tools require meticulous clinical guidance to maximize their benefits.

TRIAL REGISTRATION

The study was not registered, as it did not include direct medical intervention on human participants.

Collapse

Kafke SD, Kuhlmey A, Schuster J, Blüher S, Czimmeck C, Zoellick JC, Grosse P. Can clinical decision support systems be an asset in medical education? An experimental approach. BMC MEDICAL EDUCATION 2023;23:570. [PMID: 37568144 PMCID: PMC10416486 DOI: 10.1186/s12909-023-04568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 08/04/2023] [Indexed: 08/13/2023]

Abstract

BACKGROUND

Diagnostic accuracy is one of the major cornerstones of appropriate and successful medical decision-making. Clinical decision support systems (CDSSs) have recently been used to facilitate physician's diagnostic considerations. However, to date, little is known about the potential assets of CDSS for medical students in an educational setting. The purpose of our study was to explore the usefulness of CDSSs for medical students assessing their diagnostic performances and the influence of such software on students' trust in their own diagnostic abilities.

METHODS

Based on paper cases students had to diagnose two different patients using a CDSS and conventional methods such as e.g. textbooks, respectively. Both patients had a common disease, in one setting the clinical presentation was a typical one (tonsillitis), in the other setting (pulmonary embolism), however, the patient presented atypically. We used a 2x2x2 between- and within-subjects cluster-randomised controlled trial to assess the diagnostic accuracy in medical students, also by changing the order of the used resources (CDSS first or second).

RESULTS

Medical students in their 4th and 5th year performed equally well using conventional methods or the CDSS across the two cases (t(164) = 1,30; p = 0.197). Diagnostic accuracy and trust in the correct diagnosis were higher in the typical presentation condition than in the atypical presentation condition (t(85) = 19.97; p < .0001 and t(150) = 7.67; p < .0001).These results refute our main hypothesis that students diagnose more accurately when using conventional methods compared to the CDSS.

CONCLUSIONS

Medical students in their 4th and 5th year performed equally well in diagnosing two cases of common diseases with typical or atypical clinical presentations using conventional methods or a CDSS. Students were proficient in diagnosing a common disease with a typical presentation but underestimated their own factual knowledge in this scenario. Also, students were aware of their own diagnostic limitations when presented with a challenging case with an atypical presentation for which the use of a CDSS seemingly provided no additional insights.

Collapse

Karia J, Mohamed R, Petrushkin H. Patient-targeted mobile applications in healthcare. Br J Hosp Med (Lond) 2023;84:1-5. [PMID: 37646550 DOI: 10.12968/hmed.2023.0158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]

Sarbay İ, Berikol GB, Özturan İU. Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study. Turk J Emerg Med 2023;23:156-161. [PMID: 37529789 PMCID: PMC10389099 DOI: 10.4103/tjem.tjem_79_23] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 04/13/2023] [Accepted: 05/24/2023] [Indexed: 08/03/2023] Open

Abstract

OBJECTIVES

Artificial intelligence companies have been increasing their initiatives recently to improve the results of chatbots, which are software programs that can converse with a human in natural language. The role of chatbots in health care is deemed worthy of research. OpenAI's ChatGPT is a supervised and empowered machine learning-based chatbot. The aim of this study was to determine the performance of ChatGPT in emergency medicine (EM) triage prediction.

METHODS

This was a preliminary, cross-sectional study conducted with case scenarios generated by the researchers based on the emergency severity index (ESI) handbook v4 cases. Two independent EM specialists who were experts in the ESI triage scale determined the triage categories for each case. A third independent EM specialist was consulted as arbiter, if necessary. Consensus results for each case scenario were assumed as the reference triage category. Subsequently, each case scenario was queried with ChatGPT and the answer was recorded as the index triage category. Inconsistent classifications between the ChatGPT and reference category were defined as over-triage (false positive) or under-triage (false negative).

RESULTS

Fifty case scenarios were assessed in the study. Reliability analysis showed a fair agreement between EM specialists and ChatGPT (Cohen's Kappa: 0.341). Eleven cases (22%) were over triaged and 9 (18%) cases were under triaged by ChatGPT. In 9 cases (18%), ChatGPT reported two consecutive triage categories, one of which matched the expert consensus. It had an overall sensitivity of 57.1% (95% confidence interval [CI]: 34-78.2), specificity of 34.5% (95% CI: 17.9-54.3), positive predictive value (PPV) of 38.7% (95% CI: 21.8-57.8), negative predictive value (NPV) of 52.6 (95% CI: 28.9-75.6), and an F1 score of 0.461. In high acuity cases (ESI-1 and ESI-2), ChatGPT showed a sensitivity of 76.2% (95% CI: 52.8-91.8), specificity of 93.1% (95% CI: 77.2-99.2), PPV of 88.9% (95% CI: 65.3-98.6), NPV of 84.4 (95% CI: 67.2-94.7), and an F1 score of 0.821. The receiver operating characteristic curve showed an area under the curve of 0.846 (95% CI: 0.724-0.969, P < 0.001) for high acuity cases.

CONCLUSION

The performance of ChatGPT was best when predicting high acuity cases (ESI-1 and ESI-2). It may be useful when determining the cases requiring critical care. When trained with more medical knowledge, ChatGPT may be more accurate for other triage category predictions.

Collapse

Riboli-Sasco E, El-Osta A, Alaa A, Webber I, Karki M, El Asmar ML, Purohit K, Painter A, Hayhoe B. Triage and Diagnostic Accuracy of Online Symptom Checkers: Systematic Review. J Med Internet Res 2023;25:e43803. [PMID: 37266983 DOI: 10.2196/43803] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 03/27/2023] [Accepted: 04/11/2023] [Indexed: 06/03/2023] Open

Abstract

BACKGROUND

In the context of a deepening global shortage of health workers and, in particular, the COVID-19 pandemic, there is growing international interest in, and use of, online symptom checkers (OSCs). However, the evidence surrounding the triage and diagnostic accuracy of these tools remains inconclusive.

OBJECTIVE

This systematic review aimed to summarize the existing peer-reviewed literature evaluating the triage accuracy (directing users to appropriate services based on their presenting symptoms) and diagnostic accuracy of OSCs aimed at lay users for general health concerns.

METHODS

Searches were conducted in MEDLINE, Embase, CINAHL, Health Management Information Consortium (HMIC), and Web of Science, as well as the citations of the studies selected for full-text screening. We included peer-reviewed studies published in English between January 1, 2010, and February 16, 2022, with a controlled and quantitative assessment of either or both triage and diagnostic accuracy of OSCs directed at lay users. We excluded tools supporting health care professionals, as well as disease- or specialty-specific OSCs. Screening and data extraction were carried out independently by 2 reviewers for each study. We performed a descriptive narrative synthesis.

RESULTS

A total of 21,296 studies were identified, of which 14 (0.07%) were included. The included studies used clinical vignettes, medical records, or direct input by patients. Of the 14 studies, 6 (43%) reported on triage and diagnostic accuracy, 7 (50%) focused on triage accuracy, and 1 (7%) focused on diagnostic accuracy. These outcomes were assessed based on the diagnostic and triage recommendations attached to the vignette in the case of vignette studies or on those provided by nurses or general practitioners, including through face-to-face and telephone consultations. Both diagnostic accuracy and triage accuracy varied greatly among OSCs. Overall diagnostic accuracy was deemed to be low and was almost always lower than that of the comparator. Similarly, most of the studies (9/13, 69 %) showed suboptimal triage accuracy overall, with a few exceptions (4/13, 31%). The main variables affecting the levels of diagnostic and triage accuracy were the severity and urgency of the condition, the use of artificial intelligence algorithms, and demographic questions. However, the impact of each variable differed across tools and studies, making it difficult to draw any solid conclusions. All included studies had at least one area with unclear risk of bias according to the revised Quality Assessment of Diagnostic Accuracy Studies-2 tool.

CONCLUSIONS

Although OSCs have potential to provide accessible and accurate health advice and triage recommendations to users, more research is needed to validate their triage and diagnostic accuracy before widescale adoption in community and health care settings. Future studies should aim to use a common methodology and agreed standard for evaluation to facilitate objective benchmarking and validation.

TRIAL REGISTRATION

PROSPERO CRD42020215210; https://tinyurl.com/3949zw83.

Collapse

Turnbull J, MacLellan J, Churruca K, Ellis LA, Prichard J, Browne D, Braithwaite J, Petter E, Chisambi M, Pope C. A multimethod study of NHS 111 online. HEALTH AND SOCIAL CARE DELIVERY RESEARCH 2023;11:1-104. [PMID: 37464813 DOI: 10.3310/ytrr9821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]

Abstract

Background

NHS 111 online offers 24-hour access to health assessment and triage.

Objectives

This study examined pathways to care, differential access and use, and workforce impacts of NHS 111 online. This study compared NHS 111 with Healthdirect (Haymarket, Australia) virtual triage.

Design

Interviews with 80 staff and stakeholders in English primary, urgent and emergency care, and 41 staff and stakeholders associated with Healthdirect. A survey of 2754 respondents, of whom 1137 (41.3%) had used NHS 111 online and 1617 (58.7%) had not.

Results

NHS 111 online is one of several digital health-care technologies and was not differentiated from the NHS 111 telephone service or well understood. There is a similar lack of awareness of Healthdirect virtual triage. NHS 111 and Healthdirect virtual triage are perceived as creating additional work for health-care staff and inappropriate demand for some health services, especially emergency care. One-third of survey respondents reported that they had not used any NHS 111 service (telephone or online). Older people and those with less educational qualifications are less likely to use NHS 111 online. Respondents who had used NHS 111 online reported more use of other urgent care services and make more cumulative use of services than those who had not used NHS 111 online. Users of NHS 111 online had higher levels of self-reported eHealth literacy. There were differences in reported preferences for using NHS 111 online for different symptom presentations.

Conclusions

Greater clarity about what the NHS 111 online service offers would allow better signposting and reduce confusion. Generic NHS 111 services are perceived as creating additional work in the primary, urgent and emergency care system. There are differences in eHealth literacy between users and those who have not used NHS 111 online, and this suggests that 'digital first' policies may increase health inequalities.

Limitations

This research bridged the pandemic from 2020 to 2021; therefore, findings may change as services adjust going forward. Surveys used a digital platform so there is probably bias towards some level of e-Literacy, but this also means that our data may underestimate the digital divide.

Future work

Further investigation of access to digital services could address concerns about digital exclusion. Research comparing the affordances and cost-benefits of different triage and assessment systems for users and health-care providers is needed. Research about trust in virtual assessments may show how duplication can be reduced. Mixed-methods studies looking at outcomes, impacts on work and costs, and ways to measure eHealth literacy, can inform the development NHS 111 online and opportunities for further international shared learning could be pursued.

Study registration

This study is registered at the research registry (UIN 5392).

Funding

This project was funded by the National Institute for Health and Care Research (NIHR) Health and Social Care Delivery Research Programme and will be published in full in Health and Social Care Delivery Research; Vol. 11, No. 5. See the NIHR Journals Library website for further project information.

Collapse

Odisho AY, Liu AW, Maiorano AR, Bigazzi MOA, Medina E, Leard LE, Shah R, Venado A, Perez A, Golden J, Kleinhenz ME, Kolaitis NA, Maheshwari J, Trinh BN, Kukreja J, Greenland J, Calabrese D, Neinstein AB, Singer JP, Hays SR. Design and implementation of a digital health home spirometry intervention for remote monitoring of lung transplant function. J Heart Lung Transplant 2023;42:828-837. [PMID: 37031033 DOI: 10.1016/j.healun.2023.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 01/04/2023] [Accepted: 01/23/2023] [Indexed: 02/04/2023] Open

Affiliation(s)

Anobel Y Odisho Center for Digital Health Innovation, University of California, San Francisco, California; Department of Urology, University of California, San Francisco, California
Andrew W Liu Center for Digital Health Innovation, University of California, San Francisco, California
Ali R Maiorano Center for Digital Health Innovation, University of California, San Francisco, California
M Olivia A Bigazzi Center for Digital Health Innovation, University of California, San Francisco, California
Eli Medina Center for Digital Health Innovation, University of California, San Francisco, California
Lorriana E Leard Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Rupal Shah Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Aida Venado Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Alyssa Perez Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Jeffrey Golden Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Mary Ellen Kleinhenz Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Nicholas A Kolaitis Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Julia Maheshwari Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Binh N Trinh Department of Surgery, University of California, San Francisco, California
Jasleen Kukreja Department of Surgery, University of California, San Francisco, California
John Greenland Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Daniel Calabrese Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Aaron B Neinstein Center for Digital Health Innovation, University of California, San Francisco, California; Endocrinology Division, Department of Medicine, University of California, San Francisco, California
Jonathan P Singer Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California
Steven R Hays Pulmonary, Critical Care, Allergy and Sleep Medicine Division, Department of Medicine, University of California, San Francisco, California.

Collapse

Marcin T, Hautz SC, Singh H, Zwaan L, Schwappach D, Krummrey G, Schauber SK, Nendaz M, Exadaktylos AK, Müller M, Lambrigger C, Sauter TC, Lindner G, Bosbach S, Griesshammer I, Hautz WE. Effects of a computerised diagnostic decision support tool on diagnostic quality in emergency departments: study protocol of the DDx-BRO multicentre cluster randomised cross-over trial. BMJ Open 2023;13:e072649. [PMID: 36990482 PMCID: PMC10069571 DOI: 10.1136/bmjopen-2023-072649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/31/2023] Open

Abstract

INTRODUCTION

Computerised diagnostic decision support systems (CDDS) suggesting differential diagnoses to physicians aim to improve clinical reasoning and diagnostic quality. However, controlled clinical trials investigating their effectiveness and safety are absent and the consequences of its use in clinical practice are unknown. We aim to investigate the effect of CDDS use in the emergency department (ED) on diagnostic quality, workflow, resource consumption and patient outcomes.

METHODS AND ANALYSIS

This is a multicentre, outcome assessor and patient-blinded, cluster-randomised, multiperiod crossover superiority trial. A validated differential diagnosis generator will be implemented in four EDs and randomly allocated to a sequence of six alternating intervention and control periods. During intervention periods, the treating ED physician will be asked to consult the CDDS at least once during diagnostic workup. During control periods, physicians will not have access to the CDDS and diagnostic workup will follow usual clinical care. Key inclusion criteria will be patients' presentation to the ED with either fever, abdominal pain, syncope or a non-specific complaint as chief complaint. The primary outcome is a binary diagnostic quality risk score composed of presence of an unscheduled medical care after discharge, change in diagnosis or death during time of follow-up or an unexpected upscale in care within 24 hours after hospital admission. Time of follow-up is 14 days. At least 1184 patients will be included. Secondary outcomes include length of hospital stay, diagnostics and data regarding CDDS usage, physicians' confidence calibration and diagnostic workflow. Statistical analysis will use general linear mixed modelling methods.

ETHICS AND DISSEMINATION

Approved by the cantonal ethics committee of canton Berne (2022-D0002) and Swissmedic, the Swiss national regulatory authority on medical devices. Study results will be disseminated through peer-reviewed journals, open repositories and the network of investigators and the expert and patients advisory board.

TRIAL REGISTRATION NUMBER

NCT05346523.

Collapse

Affiliation(s)

Thimo Marcin Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
Stefanie C Hautz Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
Hardeep Singh Center for Innovations in Quality, Effectiveness and Safety (IQuESt), Michael E DeBakey VA Medical Center, Houston, Texas, USA Department of Medicine, Baylor College of Medicine, Houston, Texas, USA
Laura Zwaan Institute of Medical Education Research Rotterdam (iMERR), Erasmus Medical Center, Rotterdam, The Netherlands
David Schwappach Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
Gert Krummrey Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland Bern University of Applied Sciences, Biel, Switzerland
Stefan K Schauber Center for Educational Measurement and Faculty of Medicine, University of Oslo, Oslo, Norway
Mathieu Nendaz Department of Medicine, University of Geneva, Geneve, Switzerland
Aristomenis Konstantinos Exadaktylos Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
Martin Müller Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
Cornelia Lambrigger Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
Thomas C Sauter Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
Gregor Lindner Department of Internal and Emergency Medicine, Burgerspital Solothurn, Solothurn, Switzerland
Simon Bosbach Spital Tiefenau, Insel Gruppe AG, Bern, Switzerland
Ines Griesshammer Spital Muensingen, Insel Gruppe AG, Bern, Switzerland
Wolf E Hautz Department of Emergency Medicine, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland

Collapse

Lloyd ML, Billingslea S, Slama R. Atraumatic Vertebral Artery Dissection in a Patient With a Migraine Headache. Mil Med 2023;188:e848-e851. [PMID: 33876248 DOI: 10.1093/milmed/usab135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 03/10/2021] [Accepted: 03/31/2021] [Indexed: 11/12/2022] Open

Exploratory study: Evaluation of a symptom checker effectiveness for providing a diagnosis and evaluating the situation emergency compared to emergency physicians using simulated and standardized patients. PLoS One 2023;18:e0277568. [PMID: 36827277 PMCID: PMC9955603 DOI: 10.1371/journal.pone.0277568] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 10/30/2022] [Indexed: 02/25/2023] Open

Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023;20:3378. [PMID: 36834073 PMCID: PMC9967747 DOI: 10.3390/ijerph20043378] [Citation(s) in RCA: 100] [Impact Index Per Article: 100.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 02/09/2023] [Accepted: 02/13/2023] [Indexed: 06/01/2023]

Online symptom checkers lack diagnostic accuracy for skin rashes. J Am Acad Dermatol 2023;88:487-488. [PMID: 36243544 DOI: 10.1016/j.jaad.2022.06.034] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 05/17/2022] [Accepted: 06/07/2022] [Indexed: 11/06/2022]

Pairon A, Philips H, Verhoeven V. A scoping review on the use and usefulness of online symptom checkers and triage systems: How to proceed? Front Med (Lausanne) 2023;9:1040926. [PMID: 36687416 PMCID: PMC9853165 DOI: 10.3389/fmed.2022.1040926] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/16/2022] [Indexed: 01/09/2023] Open

Abstract

Background

Patients are increasingly turning to the Internet for health information. Numerous online symptom checkers and digital triage tools are currently available to the general public in an effort to meet this need, simultaneously acting as a demand management strategy to aid the overburdened health care system. The implementation of these services requires an evidence-based approach, warranting a review of the available literature on this rapidly evolving topic.

Objective

This scoping review aims to provide an overview of the current state of the art and identify research gaps through an analysis of the strengths and weaknesses of the presently available literature.

Methods

A systematic search strategy was formed and applied to six databases: Cochrane library, NICE, DARE, NIHR, Pubmed, and Web of Science. Data extraction was performed by two researchers according to a pre-established data charting methodology allowing for a thematic analysis of the results.

Results

A total of 10,250 articles were identified, and 28 publications were found eligible for inclusion. Users of these tools are often younger, female, more highly educated and technologically literate, potentially impacting digital divide and health equity. Triage algorithms remain risk-averse, which causes challenges for their accuracy. Recent evolutions in algorithms have varying degrees of success. Results on impact are highly variable, with potential effects on demand, accessibility of care, health literacy and syndromic surveillance. Both patients and healthcare providers are generally positive about the technology and seem amenable to the advice given, but there are still improvements to be made toward a more patient-centered approach. The significant heterogeneity across studies and triage systems remains the primary challenge for the field, limiting transferability of findings.

Conclusion

Current evidence included in this review is characterized by significant variability in study design and outcomes, highlighting the significant challenges for future research.An evolution toward more homogeneous methodologies, studies tailored to the intended setting, regulation and standardization of evaluations, and a patient-centered approach could benefit the field.

Collapse

Kopka M, Feufel MA, Berner ES, Schmieding ML. How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective. Digit Health 2023;9:20552076231194929. [PMID: 37614591 PMCID: PMC10444026 DOI: 10.1177/20552076231194929] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 07/28/2023] [Indexed: 08/25/2023] Open

Abstract

Objective

To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies.

Methods

We re-analyzed the publicly available data of two prominent case vignette-based symptom checker audit studies by calculating common metrics of test theory. Furthermore, we developed a new metric, the Capability Comparison Score (CCS), which compares symptom checker capability while controlling for the difficulty of the set of cases each symptom checker evaluated. We then scrutinized whether applying test theory and the CCS altered the performance ranking of the investigated symptom checkers.

Results

In both studies, most symptom checkers changed their rank order when adjusting the triage capability for item difficulty (ID) with the CCS. The previously reported triage accuracies commonly overestimated the capability of symptom checkers because they did not account for the fact that symptom checkers tend to selectively appraise easier cases (i.e., with high ID values). Also, many case vignettes in both studies showed insufficient (very low and even negative) values of item-total correlation (ITC), suggesting that individual items or the composition of item sets are of low quality.

Conclusions

A test-theoretic perspective helps identify previously undetected threats to the validity of case vignette-based symptom checker assessments and provides guidance and specific metrics to improve the quality of case vignettes, in particular by controlling for the difficulty of the vignettes an app was (not) able to evaluate correctly. Such measures might prove more meaningful than accuracy alone for the competitive assessment of symptom checkers. Our approach helps elaborate and standardize the methodology used for appraising symptom checker capability, which, ultimately, may yield more reliable results.

Collapse

Churruca K, Ellis LA, Pope C, MacLellan J, Zurynski Y, Braithwaite J. The place of digital triage in a complex healthcare system: An interview study with key stakeholders in Australia's national provider. Digit Health 2023;9:20552076231181201. [PMID: 37377561 PMCID: PMC10291532 DOI: 10.1177/20552076231181201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 05/24/2023] [Indexed: 06/29/2023] Open

Abstract

Background

Digital triage tools such as telephone advice and online symptom checkers are now commonplace in health systems internationally. Research has focused on consumers' adherence to advice, health outcomes, satisfaction, and the degree to which these services manage demand for general practice or emergency departments. Such studies have had mixed findings, leaving equivocal the role of these services in healthcare.

Objective

We examined stakeholders' perspectives on Healthdirect, Australia's national digital triage provider, focusing on its role in the health system, and barriers to operation, in the context of the COVID-19 pandemic.

Methods

Key stakeholders took part in semi-structured interviews conducted online in the third quarter of 2021. Transcripts were coded and thematically analysed.

Results

Participants (n = 41) were Healthdirect staff (n = 13), employees of Primary Health Networks (PHNs; n = 12), clinicians (n = 9), shareholder representatives (n = 4), consumer representatives (n = 2) and other policymakers (n = 1). Eight themes emerged from the analysis: (1) information and guidance in navigating the system, (2) efficiency through appropriate care, (3) value for consumers? (4) the difficulties in triage at a distance, (5) competition and the unfulfilled promise of integration, (6) challenges in promoting Healthdirect, (7) monitoring and evaluating digital triage services and (8) rapid change, challenge and opportunity from COVID-19.

Conclusion

Stakeholders varied in their views of the purpose of Healthdirect's digital triage services. They identified challenges in lack of integration, competition, and the limited public profile of the services, issues largely reflective of the complexity of the policy and health system landscape. There was acknowledgement of the value of the services during the COVID-19 pandemic, and an expectation of them realising greater potential in the wake of the rapid uptake of telehealth.

Collapse

North F, Jensen TB, Stroebel RJ, Nelson EM, Johnson BJ, Thompson MC, Pecina JL, Crum BA. Self-Triage Use, Subsequent Healthcare Utilization, and Diagnoses: A Retrospective Study of Process and Clinical Outcomes Following Self-Triage and Self-Scheduling for Ear or Hearing Symptoms. Health Serv Res Manag Epidemiol 2023;10:23333928231168121. [PMID: 37101803 PMCID: PMC10123887 DOI: 10.1177/23333928231168121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2023] Open

Abstract

Background

Self-triage is becoming more widespread, but little is known about the people who are using online self-triage tools and their outcomes. For self-triage researchers, there are significant barriers to capturing subsequent healthcare outcomes. Our integrated healthcare system was able to capture subsequent healthcare utilization of individuals who used self-triage integrated with self-scheduling of provider visits.

Methods

We retrospectively examined healthcare utilization and diagnoses after patients had used self-triage and self-scheduling for ear or hearing symptoms. Outcomes and counts of office visits, telemedicine interactions, emergency department visits, and hospitalizations were captured. Diagnosis codes associated with subsequent provider visits were dichotomously categorized as being associated with ear or hearing concerns or not. Nonvisit care encounters of patient-initiated messages, nurse triage calls, and clinical communications were also captured.

Results

For 2168 self-triage uses, we were able to capture subsequent healthcare encounters within 7 days of the self-triage for 80.5% (1745/2168). In subsequent 1092 office visits with diagnoses, 83.1% (891/1092) of the uses were associated with relevant ear, nose and throat diagnoses. Only 0.24% (4/1662) of patients with captured outcomes were associated with a hospitalization within 7 days. Self-triage resulted in a self-scheduled office visit in 7.2% (126/1745). Office visits resulting from a self-scheduled visit had significantly fewer combined non-visit care encounters per office visit (fewer combined nurse triage calls, patient messages, and clinical communication messages) than office visits that were not self-scheduled (-0.51; 95% CI, -0.72 to -0.29; P < .0001).

Conclusion

In an appropriate healthcare setting, self-triage outcomes can be captured in a high percentage of uses to examine for safety, patient adherence to recommendations, and efficiency of self-triage. With the ear or hearing self-triage, most uses had subsequent visit diagnoses relevant to ear or hearing, so most patients appeared to be selecting the appropriate self-triage pathway for their symptoms.

Collapse

A clinical decision support system in back pain helps to find the diagnosis: a prospective correlation study. Arch Orthop Trauma Surg 2023;143:621-625. [PMID: 34347121 PMCID: PMC9925533 DOI: 10.1007/s00402-021-04080-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 07/17/2021] [Indexed: 10/20/2022]

Ilicki J. Challenges in evaluating the accuracy of AI-containing digital triage systems: A systematic review. PLoS One 2022;17:e0279636. [PMID: 36574438 PMCID: PMC9794085 DOI: 10.1371/journal.pone.0279636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 12/12/2022] [Indexed: 12/28/2022] Open

Abstract

INTRODUCTION

Patient-operated digital triage systems with AI components are becoming increasingly common. However, previous reviews have found a limited amount of research on such systems' accuracy. This systematic review of the literature aimed to identify the main challenges in determining the accuracy of patient-operated digital AI-based triage systems.

METHODS

A systematic review was designed and conducted in accordance with PRISMA guidelines in October 2021 using PubMed, Scopus and Web of Science. Articles were included if they assessed the accuracy of a patient-operated digital triage system that had an AI-component and could triage a general primary care population. Limitations and other pertinent data were extracted, synthesized and analysed. Risk of bias was not analysed as this review studied the included articles' limitations (rather than results). Results were synthesized qualitatively using a thematic analysis.

RESULTS

The search generated 76 articles and following exclusion 8 articles (6 primary articles and 2 reviews) were included in the analysis. Articles' limitations were synthesized into three groups: epistemological, ontological and methodological limitations. Limitations varied with regards to intractability and the level to which they can be addressed through methodological choices. Certain methodological limitations related to testing triage systems using vignettes can be addressed through methodological adjustments, whereas epistemological and ontological limitations require that readers of such studies appraise the studies with limitations in mind.

DISCUSSION

The reviewed literature highlights recurring limitations and challenges in studying the accuracy of patient-operated digital triage systems with AI components. Some of these challenges can be addressed through methodology whereas others are intrinsic to the area of inquiry and involve unavoidable trade-offs. Future studies should take these limitations in consideration in order to better address the current knowledge gaps in the literature.

Collapse

Ponce-Blandón JA, Romero-Castillo R, Rodríguez-Leal L, González-Hervías R, Velarde-García JF, Álvarez-Embarba B. A Multicenter Study about the Population Treated in the Respiratory Triage Stations Deployed by the Red Cross during the COVID-19 Pandemic. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;20:313. [PMID: 36612635 PMCID: PMC9819537 DOI: 10.3390/ijerph20010313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/18/2022] [Accepted: 12/22/2022] [Indexed: 06/17/2023]

Vargas Meza X, Koyama S. A social media network analysis of trypophobia communication. Sci Rep 2022;12:21163. [PMID: 36477698 PMCID: PMC9729576 DOI: 10.1038/s41598-022-25301-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 11/28/2022] [Indexed: 12/13/2022] Open

Müller R, Klemmt M, Ehni HJ, Henking T, Kuhnmünch A, Preiser C, Koch R, Ranisch R. Ethical, legal, and social aspects of symptom checker applications: a scoping review. MEDICINE, HEALTH CARE, AND PHILOSOPHY 2022;25:737-755. [PMID: 36181620 PMCID: PMC9613552 DOI: 10.1007/s11019-022-10114-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 09/03/2022] [Indexed: 06/16/2023]

Judson TJ, Pierce L, Tutman A, Mourad M, Neinstein AB, Shuler G, Gonzales R, Odisho AY. Utilization patterns and efficiency gains from use of a fully EHR-integrated COVID-19 self-triage and self-scheduling tool: a retrospective analysis. J Am Med Inform Assoc 2022;29:2066-2074. [PMID: 36029243 PMCID: PMC9667153 DOI: 10.1093/jamia/ocac161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 08/18/2022] [Accepted: 08/26/2022] [Indexed: 11/13/2022] Open

Painter A, Hayhoe B, Riboli-Sasco E, El-Osta A. Online Symptom Checkers: Recommendations for a Vignette-Based Clinical Evaluation Standard. J Med Internet Res 2022;24:e37408. [DOI: 10.2196/37408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/13/2022] Open

Sampietro-Colom L, Fernandez-Barcelo C, Abbas I, Valdasquin B, Rabasseda N, García-Lorenzo B, Sanchez M, Sans M, Garcia N, Granados A. WtsWrng Interim Comparative Effectiveness Evaluation and Description of the Challenges to Develop, Assess, and Introduce This Novel Digital Application in a Traditional Health System. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:13873. [PMID: 36360756 PMCID: PMC9654177 DOI: 10.3390/ijerph192113873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 10/21/2022] [Accepted: 10/22/2022] [Indexed: 06/16/2023]

Talukder AK, Schriml L, Ghosh A, Biswas R, Chakrabarti P, Haas RE. Diseasomics: Actionable machine interpretable disease knowledge at the point-of-care. PLOS DIGITAL HEALTH 2022;1:e0000128. [PMID: 36812614 PMCID: PMC9931276 DOI: 10.1371/journal.pdig.0000128] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 09/14/2022] [Indexed: 11/06/2022]

Abstract

Physicians establish diagnosis by assessing a patient's signs, symptoms, age, sex, laboratory test findings and the disease history. All this must be done in limited time and against the backdrop of an increasing overall workload. In the era of evidence-based medicine it is utmost important for a clinician to be abreast of the latest guidelines and treatment protocols which are changing rapidly. In resource limited settings, the updated knowledge often does not reach the point-of-care. This paper presents an artificial intelligence (AI)-based approach for integrating comprehensive disease knowledge, to support physicians and healthcare workers in arriving at accurate diagnoses at the point-of-care. We integrated different disease-related knowledge bodies to construct a comprehensive, machine interpretable diseasomics knowledge-graph that includes the Disease Ontology, disease symptoms, SNOMED CT, DisGeNET, and PharmGKB data. The resulting disease-symptom network comprises knowledge from the Symptom Ontology, electronic health records (EHR), human symptom disease network, Disease Ontology, Wikipedia, PubMed, textbooks, and symptomology knowledge sources with 84.56% accuracy. We also integrated spatial and temporal comorbidity knowledge obtained from EHR for two population data sets from Spain and Sweden respectively. The knowledge graph is stored in a graph database as a digital twin of the disease knowledge. We use node2vec (node embedding) as digital triplet for link prediction in disease-symptom networks to identify missing associations. This diseasomics knowledge graph is expected to democratize the medical knowledge and empower non-specialist health workers to make evidence based informed decisions and help achieve the goal of universal health coverage (UHC). The machine interpretable knowledge graphs presented in this paper are associations between various entities and do not imply causation. Our differential diagnostic tool focusses on signs and symptoms and does not include a complete assessment of patient's lifestyle and health history which would typically be necessary to rule out conditions and to arrive at a final diagnosis. The predicted diseases are ordered according to the specific disease burden in South Asia. The knowledge graphs and the tools presented here can be used as a guide.

Collapse

Napierala H, Kopka M, Altendorf MB, Bolanaki M, Schmidt K, Piper SK, Heintze C, Möckel M, Balzer F, Slagman A, Schmieding ML. Examining the impact of a symptom assessment application on patient-physician interaction among self-referred walk-in patients in the emergency department (AKUSYM): study protocol for a multi-center, randomized controlled, parallel-group superiority trial. Trials 2022;23:791. [PMID: 36127742 PMCID: PMC9490986 DOI: 10.1186/s13063-022-06688-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/24/2022] [Indexed: 11/10/2022] Open

Affiliation(s)

Hendrik Napierala Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of General Practice and Family Medicine, Charitéplatz 1, 10117, Berlin, Germany
Marvin Kopka Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117, Berlin, Germany.,Cognitive Psychology and Ergonomics, Department of Psychology and Ergonomics (IPA), Technische Universität Berlin, Straße des 17. Juni 135, 10623, Berlin, Germany
Maria B Altendorf Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Emergency and Acute Medicine and Health Services Research in Emergency Medicine (CVK, CCM), Charitéplatz 1, 10117, Berlin, Germany
Myrto Bolanaki Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Emergency and Acute Medicine and Health Services Research in Emergency Medicine (CVK, CCM), Charitéplatz 1, 10117, Berlin, Germany
Konrad Schmidt Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of General Practice and Family Medicine, Charitéplatz 1, 10117, Berlin, Germany.,Jena University Hospital, Institute of General Practice and Family Medicine, Bachstr. 18, 07743, Jena, Germany
Sophie K Piper Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117, Berlin, Germany.,Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany.,Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
Christoph Heintze Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of General Practice and Family Medicine, Charitéplatz 1, 10117, Berlin, Germany
Martin Möckel Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Emergency and Acute Medicine and Health Services Research in Emergency Medicine (CVK, CCM), Charitéplatz 1, 10117, Berlin, Germany
Felix Balzer Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117, Berlin, Germany
Anna Slagman Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Emergency and Acute Medicine and Health Services Research in Emergency Medicine (CVK, CCM), Charitéplatz 1, 10117, Berlin, Germany
Malte L Schmieding Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institute of Medical Informatics, Charitéplatz 1, 10117, Berlin, Germany. .,docport Services GmbH, Tußmannstr. 75, 40477, Düsseldorf, Germany.

Collapse

Fraser HSF, Cohan G, Koehler C, Anderson J, Lawrence A, Pateña J, Bacher I, Ranney ML. Evaluation of Diagnostic and Triage Accuracy and Usability of a Symptom Checker in an Emergency Department: Observational Study. JMIR Mhealth Uhealth 2022;10:e38364. [PMID: 36121688 PMCID: PMC9531004 DOI: 10.2196/38364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 05/31/2022] [Accepted: 06/10/2022] [Indexed: 11/26/2022] Open

Abstract

Background

Symptom checkers are clinical decision support apps for patients, used by tens of millions of people annually. They are designed to provide diagnostic and triage advice and assist users in seeking the appropriate level of care. Little evidence is available regarding their diagnostic and triage accuracy with direct use by patients for urgent conditions.

Objective

The aim of this study is to determine the diagnostic and triage accuracy and usability of a symptom checker in use by patients presenting to an emergency department (ED).

Methods

We recruited a convenience sample of English-speaking patients presenting for care in an urban ED. Each consenting patient used a leading symptom checker from Ada Health before the ED evaluation. Diagnostic accuracy was evaluated by comparing the symptom checker’s diagnoses and those of 3 independent emergency physicians viewing the patient-entered symptom data, with the final diagnoses from the ED evaluation. The Ada diagnoses and triage were also critiqued by the independent physicians. The patients completed a usability survey based on the Technology Acceptance Model.

Results

A total of 40 (80%) of the 50 participants approached completed the symptom checker assessment and usability survey. Their mean age was 39.3 (SD 15.9; range 18-76) years, and they were 65% (26/40) female, 68% (27/40) White, 48% (19/40) Hispanic or Latino, and 13% (5/40) Black or African American. Some cases had missing data or a lack of a clear ED diagnosis; 75% (30/40) were included in the analysis of diagnosis, and 93% (37/40) for triage. The sensitivity for at least one of the final ED diagnoses by Ada (based on its top 5 diagnoses) was 70% (95% CI 54%-86%), close to the mean sensitivity for the 3 physicians (on their top 3 diagnoses) of 68.9%. The physicians rated the Ada triage decisions as 62% (23/37) fully agree and 24% (9/37) safe but too cautious. It was rated as unsafe and too risky in 22% (8/37) of cases by at least one physician, in 14% (5/37) of cases by at least two physicians, and in 5% (2/37) of cases by all 3 physicians. Usability was rated highly; participants agreed or strongly agreed with the 7 Technology Acceptance Model usability questions with a mean score of 84.6%, although “satisfaction” and “enjoyment” were rated low.

Conclusions

This study provides preliminary evidence that a symptom checker can provide acceptable usability and diagnostic accuracy for patients with various urgent conditions. A total of 14% (5/37) of symptom checker triage recommendations were deemed unsafe and too risky by at least two physicians based on the symptoms recorded, similar to the results of studies on telephone and nurse triage. Larger studies are needed of diagnosis and triage performance with direct patient use in different clinical environments.

Collapse

Patel R, Swanton AR, Gross MS. Online Symptom Checkers are Poor Tools for Diagnosing Men's Health Conditions. Urology 2022;170:124-131. [PMID: 36115428 DOI: 10.1016/j.urology.2022.08.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 07/24/2022] [Accepted: 08/02/2022] [Indexed: 10/14/2022]

Gräf M, Knitza J, Leipe J, Krusche M, Welcker M, Kuhn S, Mucke J, Hueber AJ, Hornig J, Klemm P, Kleinert S, Aries P, Vuillerme N, Simon D, Kleyer A, Schett G, Callhoff J. Comparison of physician and artificial intelligence-based symptom checker diagnostic accuracy. Rheumatol Int 2022;42:2167-2176. [PMID: 36087130 PMCID: PMC9548469 DOI: 10.1007/s00296-022-05202-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 08/29/2022] [Indexed: 11/29/2022]

Affiliation(s)

Markus Gräf Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.,Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany
Johannes Knitza Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany. .,Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany. .,Université Grenoble Alpes, AGEIS, Grenoble, France.
Jan Leipe Division of Rheumatology, Department of Medicine V, Medical Faculty Mannheim of the University, University Hospital Mannheim, Heidelberg, Germany
Martin Krusche Division of Rheumatology and Systemic Inflammatory Diseases, University Hospital Hamburg-Eppendorf (UKE), Hamburg, Germany
Martin Welcker Medizinisches Versorgungszentrum Für Rheumatologie Dr. M. Welcker GmbH, Planegg, Germany
Sebastian Kuhn Department of Digital Medicine, Medical Faculty OWL, Bielefeld University, Bielefeld, Germany
Johanna Mucke Policlinic and Hiller Research Unit for Rheumatology, Medical Faculty, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Axel J Hueber Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.,Division of Rheumatology, Klinikum Nürnberg, Paracelsus Medical University, Nuremberg, Germany
Johannes Hornig Rheumapraxis an Der Hase, Osnabrück, Germany
Philipp Klemm Department of Rheumatology, Immunology, Osteology and Physical Medicine, Justus Liebig University Gießen, Campus Kerckhoff, Bad Nauheim, Germany
Stefan Kleinert Praxisgemeinschaft Rheumatologie-Nephrologie, Erlangen, Germany
Peer Aries Immunologikum, Hamburg, Germany
Nicolas Vuillerme Université Grenoble Alpes, AGEIS, Grenoble, France.,Institut Universitaire de France, Paris, France.,LabCom Telecom4Health, Orange Labs & Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP-UGA, Grenoble, France
David Simon Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.,Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany
Arnd Kleyer Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.,Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany
Georg Schett Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.,Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany
Johanna Callhoff Epidemiology Unit, German Rheumatism Research Centre, Berlin, Germany.,Institute for Social Medicine, Epidemiology and Health Economics, Charité Universitätsmedizin, Berlin, Germany

Collapse

Wallace W, Chan C, Chidambaram S, Hanna L, Iqbal FM, Acharya A, Normahani P, Ashrafian H, Markar SR, Sounderajah V, Darzi A. The diagnostic and triage accuracy of digital and online symptom checker tools: a systematic review. NPJ Digit Med 2022;5:118. [PMID: 35977992 PMCID: PMC9385087 DOI: 10.1038/s41746-022-00667-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 07/25/2022] [Indexed: 11/09/2022] Open

Abstract

Digital and online symptom checkers are an increasingly adopted class of health technologies that enable patients to input their symptoms and biodata to produce a set of likely diagnoses and associated triage advice. However, concerns regarding the accuracy and safety of these symptom checkers have been raised. This systematic review evaluates the accuracy of symptom checkers in providing diagnoses and appropriate triage advice. MEDLINE and Web of Science were searched for studies that used either real or simulated patients to evaluate online or digital symptom checkers. The primary outcomes were the diagnostic and triage accuracy of the symptom checkers. The QUADAS-2 tool was used to assess study quality. Of the 177 studies retrieved, 10 studies met the inclusion criteria. Researchers evaluated the accuracy of symptom checkers using a variety of medical conditions, including ophthalmological conditions, inflammatory arthritides and HIV. A total of 50% of the studies recruited real patients, while the remainder used simulated cases. The diagnostic accuracy of the primary diagnosis was low across included studies (range: 19–37.9%) and varied between individual symptom checkers, despite consistent symptom data input. Triage accuracy (range: 48.8–90.1%) was typically higher than diagnostic accuracy. Overall, the diagnostic and triage accuracy of symptom checkers are variable and of low accuracy. Given the increasing push towards adopting this class of technologies across numerous health systems, this study demonstrates that reliance upon symptom checkers could pose significant patient safety hazards. Large-scale primary studies, based upon real-world data, are warranted to demonstrate the adequate performance of these technologies in a manner that is non-inferior to current best practices. Moreover, an urgent assessment of how these systems are regulated and implemented is required.

Collapse

Liu VDM, Kaila M, Koskela T. User initiated symptom assessment with an electronic symptom checker. Study protocol for mixed-methods validation. (Preprint). JMIR Res Protoc 2022. [PMID: 37467041 PMCID: PMC10398552 DOI: 10.2196/41423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023] Open

Abstract

BACKGROUND

The national Omaolo digital social welfare and health care service of Finland provides a symptom checker, Omaolo, which is a medical device (based on Duodecim Clinical Decision Support EBMEDS software) with a CE marking (risk class IIa), manufactured by the government-owned DigiFinland Oy. Users of this service can perform their triage by using the questions in the symptom checker. By completing the symptom checker, the user receives a recommendation for action and a service assessment with appropriate guidance regarding their health problems on the basis of a selected specific symptom in the symptom checker. This allows users to be provided with appropriate health care services, regardless of time and place.

OBJECTIVE

This study describes the protocol for the mixed methods validation process of the symptom checker available in Omaolo digital services.

METHODS

This is a mixed methods study using quantitative and qualitative methods, which will be part of the clinical validation process that takes place in primary health care centers in Finland. Each organization provides a space where the study and the nurse triage can be done in order to include an unscreened target population of users. The primary health care units provide walk-in model services, where no prior phone call or contact is required. For the validation of the Omaolo symptom checker, case vignettes will be incorporated to supplement the triage accuracy of rare and acute cases that cannot be tested extensively in real-life settings. Vignettes are produced from a variety of clinical sources, and they test the symptom checker in different triage levels by using 1 standardized patient case example.

RESULTS

This study plan underwent an ethics review by the regional permission, which was requested from each organization participating in the research, and an ethics committee statement was requested and granted from Pirkanmaa hospital district's ethics committee, which is in accordance with the University of Tampere's regulations. Of 964 clinical user-filled symptom checker assessments, 877 cases were fully completed with a triage result, and therefore, they met the requirements for clinical validation studies. The goal for sufficient data has been reached for most of the chief symptoms. Data collection was completed in September 2019, and the first feasibility and patient experience results were published by the end of 2020. Case vignettes have been identified and are to be completed before further testing the symptom checker. The analysis and reporting are estimated to be finalized in 2024.

CONCLUSIONS

The primary goals of this multimethod electronic symptom checker study are to assess safety and to provide crucial information regarding the accuracy and usability of the Omaolo electronic symptom checker. To our knowledge, this will be the first study to include real-life clinical cases along with case vignettes.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID)

DERR1-10.2196/41423.

Collapse