Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hennemann S, Kuhn S, Witthöft M, Jungmann SM. Diagnostic Performance of an App-Based Symptom Checker in Mental Disorders: Comparative Study in Psychotherapy Outpatients. JMIR Ment Health 2022;9:e32832. [PMID: 35099395 PMCID: PMC8844983 DOI: 10.2196/32832] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 11/09/2021] [Indexed: 12/31/2022] Open

For:	Hennemann S, Kuhn S, Witthöft M, Jungmann SM. Diagnostic Performance of an App-Based Symptom Checker in Mental Disorders: Comparative Study in Psychotherapy Outpatients. JMIR Ment Health 2022;9:e32832. [PMID: 35099395 PMCID: PMC8844983 DOI: 10.2196/32832] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 11/09/2021] [Indexed: 12/31/2022] Open

Number

Cited by Other Article(s)

Chamorro-Delmo J, Lopez-Fernandez O, Villasante-Soriano P, Antonio PPD, Álvarez-García R, Porras-Segovia A, Baca-García E. A feasibility study of a Smart screening tool for people at risk of mental health issues: Response rate, and sociodemographic and clinical factors. J Affect Disord 2024;362:755-761. [PMID: 39029676 DOI: 10.1016/j.jad.2024.07.067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 07/03/2024] [Accepted: 07/14/2024] [Indexed: 07/21/2024]

Hindelang M, Sitaru S, Zink A. Transforming Health Care Through Chatbots for Medical History-Taking and Future Directions: Comprehensive Systematic Review. JMIR Med Inform 2024;12:e56628. [PMID: 39207827 PMCID: PMC11393511 DOI: 10.2196/56628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 05/08/2024] [Accepted: 07/11/2024] [Indexed: 09/04/2024] Open

Abstract

BACKGROUND

The integration of artificial intelligence and chatbot technology in health care has attracted significant attention due to its potential to improve patient care and streamline history-taking. As artificial intelligence-driven conversational agents, chatbots offer the opportunity to revolutionize history-taking, necessitating a comprehensive examination of their impact on medical practice.

OBJECTIVE

This systematic review aims to assess the role, effectiveness, usability, and patient acceptance of chatbots in medical history-taking. It also examines potential challenges and future opportunities for integration into clinical practice.

METHODS

A systematic search included PubMed, Embase, MEDLINE (via Ovid), CENTRAL, Scopus, and Open Science and covered studies through July 2024. The inclusion and exclusion criteria for the studies reviewed were based on the PICOS (participants, interventions, comparators, outcomes, and study design) framework. The population included individuals using health care chatbots for medical history-taking. Interventions focused on chatbots designed to facilitate medical history-taking. The outcomes of interest were the feasibility, acceptance, and usability of chatbot-based medical history-taking. Studies not reporting on these outcomes were excluded. All study designs except conference papers were eligible for inclusion. Only English-language studies were considered. There were no specific restrictions on study duration. Key search terms included "chatbot*," "conversational agent*," "virtual assistant," "artificial intelligence chatbot," "medical history," and "history-taking." The quality of observational studies was classified using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) criteria (eg, sample size, design, data collection, and follow-up). The RoB 2 (Risk of Bias) tool assessed areas and the levels of bias in randomized controlled trials (RCTs).

RESULTS

The review included 15 observational studies and 3 RCTs and synthesized evidence from different medical fields and populations. Chatbots systematically collect information through targeted queries and data retrieval, improving patient engagement and satisfaction. The results show that chatbots have great potential for history-taking and that the efficiency and accessibility of the health care system can be improved by 24/7 automated data collection. Bias assessments revealed that of the 15 observational studies, 5 (33%) studies were of high quality, 5 (33%) studies were of moderate quality, and 5 (33%) studies were of low quality. Of the RCTs, 2 had a low risk of bias, while 1 had a high risk.

CONCLUSIONS

This systematic review provides critical insights into the potential benefits and challenges of using chatbots for medical history-taking. The included studies showed that chatbots can increase patient engagement, streamline data collection, and improve health care decision-making. For effective integration into clinical practice, it is crucial to design user-friendly interfaces, ensure robust data security, and maintain empathetic patient-physician interactions. Future research should focus on refining chatbot algorithms, improving their emotional intelligence, and extending their application to different health care settings to realize their full potential in modern medicine.

TRIAL REGISTRATION

PROSPERO CRD42023410312; www.crd.york.ac.uk/prospero.

Collapse

Szumilas D, Ochmann A, Zięba K, Bartoszewicz B, Kubrak A, Makuch S, Agrawal S, Mazur G, Chudek J. Evaluation of AI-Driven LabTest Checker for Diagnostic Accuracy and Safety: Prospective Cohort Study. JMIR Med Inform 2024;12:e57162. [PMID: 39149851 PMCID: PMC11337233 DOI: 10.2196/57162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 05/22/2024] [Accepted: 05/25/2024] [Indexed: 08/17/2024] Open

Abstract

Background

In recent years, the implementation of artificial intelligence (AI) in health care is progressively transforming medical fields, with the use of clinical decision support systems (CDSSs) as a notable application. Laboratory tests are vital for accurate diagnoses, but their increasing reliance presents challenges. The need for effective strategies for managing laboratory test interpretation is evident from the millions of monthly searches on test results' significance. As the potential role of CDSSs in laboratory diagnostics gains significance, however, more research is needed to explore this area.

Objective

The primary objective of our study was to assess the accuracy and safety of LabTest Checker (LTC), a CDSS designed to support medical diagnoses by analyzing both laboratory test results and patients' medical histories.

Methods

This cohort study embraced a prospective data collection approach. A total of 101 patients aged ≥18 years, in stable condition, and requiring comprehensive diagnosis were enrolled. A panel of blood laboratory tests was conducted for each participant. Participants used LTC for test result interpretation. The accuracy and safety of the tool were assessed by comparing AI-generated suggestions to experienced doctor (consultant) recommendations, which are considered the gold standard.

Results

The system achieved a 74.3% accuracy and 100% sensitivity for emergency safety and 92.3% sensitivity for urgent cases. It potentially reduced unnecessary medical visits by 41.6% (42/101) and achieved an 82.9% accuracy in identifying underlying pathologies.

Conclusions

This study underscores the transformative potential of AI-based CDSSs in laboratory diagnostics, contributing to enhanced patient care, efficient health care systems, and improved medical outcomes. LTC's performance evaluation highlights the advancements in AI's role in laboratory medicine.

Collapse

Knauer J, Baumeister H, Schmitt A, Terhorst Y. Acceptance of smart sensing, its determinants, and the efficacy of an acceptance-facilitating intervention in people with diabetes: results from a randomized controlled trial. Front Digit Health 2024;6:1352762. [PMID: 38863954 PMCID: PMC11165071 DOI: 10.3389/fdgth.2024.1352762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 05/06/2024] [Indexed: 06/13/2024] Open

Abstract

Background

Mental health problems are prevalent among people with diabetes, yet often under-diagnosed. Smart sensing, utilizing passively collected digital markers through digital devices, is an innovative diagnostic approach that can support mental health screening and intervention. However, the acceptance of this technology remains unclear. Grounded on the Unified Theory of Acceptance and Use of Technology (UTAUT), this study aimed to investigate (1) the acceptance of smart sensing in a diabetes sample, (2) the determinants of acceptance, and (3) the effectiveness of an acceptance facilitating intervention (AFI).

Methods

A total of N = 132 participants with diabetes were randomized to an intervention group (IG) or a control group (CG). The IG received a video-based AFI on smart sensing and the CG received an educational video on mindfulness. Acceptance and its potential determinants were assessed through an online questionnaire as a single post-measurement. The self-reported behavioral intention, interest in using a smart sensing application and installation of a smart sensing application were assessed as outcomes. The data were analyzed using latent structural equation modeling and t-tests.

Results

The acceptance of smart sensing at baseline was average (M = 12.64, SD = 4.24) with 27.8% showing low, 40.3% moderate, and 31.9% high acceptance. Performance expectancy (γ = 0.64, p < 0.001), social influence (γ = 0.23, p = .032) and trust (γ = 0.27, p = .040) were identified as potential determinants of acceptance, explaining 84% of the variance. SEM model fit was acceptable (RMSEA = 0.073, SRMR = 0.059). The intervention did not significantly impact acceptance (γ = 0.25, 95%-CI: -0.16-0.65, p = .233), interest (OR = 0.76, 95% CI: 0.38-1.52, p = .445) or app installation rates (OR = 1.13, 95% CI: 0.47-2.73, p = .777).

Discussion

The high variance in acceptance supports a need for acceptance facilitating procedures. The analyzed model supported performance expectancy, social influence, and trust as potential determinants of smart sensing acceptance; perceived benefit was the most influential factor towards acceptance. The AFI was not significant. Future research should further explore factors contributing to smart sensing acceptance and address implementation barriers.

Collapse

Hammoud M, Douglas S, Darmach M, Alawneh S, Sanyal S, Kanbour Y. Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study. JMIR AI 2024;3:e46875. [PMID: 38875676 PMCID: PMC11091811 DOI: 10.2196/46875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/15/2023] [Accepted: 03/02/2024] [Indexed: 06/16/2024]

Abstract

BACKGROUND

Medical self-diagnostic tools (or symptom checkers) are becoming an integral part of digital health and our daily lives, whereby patients are increasingly using them to identify the underlying causes of their symptoms. As such, it is essential to rigorously investigate and comprehensively report the diagnostic performance of symptom checkers using standard clinical and scientific approaches.

OBJECTIVE

This study aims to evaluate and report the accuracies of a few known and new symptom checkers using a standard and transparent methodology, which allows the scientific community to cross-validate and reproduce the reported results, a step much needed in health informatics.

METHODS

We propose a 4-stage experimentation methodology that capitalizes on the standard clinical vignette approach to evaluate 6 symptom checkers. To this end, we developed and peer-reviewed 400 vignettes, each approved by at least 5 out of 7 independent and experienced primary care physicians. To establish a frame of reference and interpret the results of symptom checkers accordingly, we further compared the best-performing symptom checker against 3 primary care physicians with an average experience of 16.6 (SD 9.42) years. To measure accuracy, we used 7 standard metrics, including M1 as a measure of a symptom checker's or a physician's ability to return a vignette's main diagnosis at the top of their differential list, F1-score as a trade-off measure between recall and precision, and Normalized Discounted Cumulative Gain (NDCG) as a measure of a differential list's ranking quality, among others.

RESULTS

The diagnostic accuracies of the 6 tested symptom checkers vary significantly. For instance, the differences in the M1, F1-score, and NDCG results between the best-performing and worst-performing symptom checkers or ranges were 65.3%, 39.2%, and 74.2%, respectively. The same was observed among the participating human physicians, whereby the M1, F1-score, and NDCG ranges were 22.8%, 15.3%, and 21.3%, respectively. When compared against each other, physicians outperformed the best-performing symptom checker by an average of 1.2% using F1-score, whereas the best-performing symptom checker outperformed physicians by averages of 10.2% and 25.1% using M1 and NDCG, respectively.

CONCLUSIONS

The performance variation between symptom checkers is substantial, suggesting that symptom checkers cannot be treated as a single entity. On a different note, the best-performing symptom checker was an artificial intelligence (AI)-based one, shedding light on the promise of AI in improving the diagnostic capabilities of symptom checkers, especially as AI keeps advancing exponentially.

Collapse

Miller NE, North F, Curry EN, Thompson MC, Pecina JL. Recommendation endpoints and safety of an online self-triage for depression symptoms. J Telemed Telecare 2024:1357633X241245161. [PMID: 38646705 DOI: 10.1177/1357633x241245161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]

Sarkar S, Gaur M, Chen LK, Garg M, Srivastava B. A review of the explainability and safety of conversational agents for mental health to identify avenues for improvement. Front Artif Intell 2023;6:1229805. [PMID: 37899961 PMCID: PMC10601652 DOI: 10.3389/frai.2023.1229805] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 08/29/2023] [Indexed: 10/31/2023] Open

Määttä J, Lindell R, Hayward N, Martikainen S, Honkanen K, Inkala M, Hirvonen P, Martikainen TJ. Diagnostic Performance, Triage Safety, and Usability of a Clinical Decision Support System Within a University Hospital Emergency Department: Algorithm Performance and Usability Study. JMIR Med Inform 2023;11:e46760. [PMID: 37656018 PMCID: PMC10501486 DOI: 10.2196/46760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 06/22/2023] [Accepted: 07/14/2023] [Indexed: 09/02/2023] Open

Abstract

Background

Computerized clinical decision support systems (CDSSs) are increasingly adopted in health care to optimize resources and streamline patient flow. However, they often lack scientific validation against standard medical care.

Objective

The purpose of this study was to assess the performance, safety, and usability of a CDSS in a university hospital emergency department setting in Kuopio, Finland.

Methods

Patients entering the emergency department were asked to voluntarily participate in this study. Patients aged 17 years or younger, patients with cognitive impairments, and patients who entered the unit in an ambulance or with the need for immediate care were excluded. Patients completed the CDSS web-based form and usability questionnaire when waiting for the triage nurse's evaluation. The CDSS data were anonymized and did not affect the patients' usual evaluation or treatment. Retrospectively, 2 medical doctors evaluated the urgency of each patient's condition by using the triage nurse's information, and urgent and nonurgent groups were created. The International Statistical Classification of Diseases, Tenth Revision diagnoses were collected from the electronic health records. Usability was assessed by using a positive version of the System Usability Scale questionnaire.

Results

In total, our analyses included 248 patients. Regarding urgency, the mean sensitivities were 85% and 19%, respectively, for urgent and nonurgent cases when assessing the performance of CDSS evaluations in comparison to that of physicians. The mean sensitivities were 85% and 35%, respectively, when comparing the evaluations between the two physicians. Our CDSS did not miss any cases that were evaluated to be emergencies by physicians; thus, all emergency cases evaluated by physicians were evaluated as either urgent cases or emergency cases by the CDSS. In differential diagnosis, the CDSS had an exact match accuracy of 45.5% (97/213). The usability was good, with a mean System Usability Scale score of 78.2 (SD 16.8).

Conclusions

In a university hospital emergency department setting with a large real-world population, our CDSS was found to be equally as sensitive in urgent patient cases as physicians and was found to have an acceptable differential diagnosis accuracy, with good usability. These results suggest that this CDSS can be safely assessed further in a real-world setting. A CDSS could accelerate triage by providing patient-provided data in advance of patients' initial consultations and categorize patient cases as urgent and nonurgent cases upon patients' arrival to the emergency department.

Collapse

Terhorst Y, Weilbacher N, Suda C, Simon L, Messner EM, Sander LB, Baumeister H. Acceptance of smart sensing: a barrier to implementation-results from a randomized controlled trial. Front Digit Health 2023;5:1075266. [PMID: 37519894 PMCID: PMC10373890 DOI: 10.3389/fdgth.2023.1075266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 06/26/2023] [Indexed: 08/01/2023] Open

Abstract

Background

Accurate and timely diagnostics are essential for effective mental healthcare. Given a resource- and time-limited mental healthcare system, novel digital and scalable diagnostic approaches such as smart sensing, which utilizes digital markers collected via sensors from digital devices, are explored. While the predictive accuracy of smart sensing is promising, its acceptance remains unclear. Based on the unified theory of acceptance and use of technology, the present study investigated (1) the effectiveness of an acceptance facilitating intervention (AFI), (2) the determinants of acceptance, and (3) the acceptance of adults toward smart sensing.

Methods

The participants (N = 202) were randomly assigned to a control group (CG) or intervention group (IG). The IG received a video AFI on smart sensing, and the CG a video on mindfulness. A reliable online questionnaire was used to assess acceptance, performance expectancy, effort expectancy, facilitating conditions, social influence, and trust. The self-reported interest in using and the installation of a smart sensing app were assessed as behavioral outcomes. The intervention effects were investigated in acceptance using t-tests for observed data and latent structural equation modeling (SEM) with full information maximum likelihood to handle missing data. The behavioral outcomes were analyzed with logistic regression. The determinants of acceptance were analyzed with SEM. The root mean square error of approximation (RMSEA) and standardized root mean square residual (SRMR) were used to evaluate the model fit.

Results

The intervention did not affect the acceptance (p = 0.357), interest (OR = 0.75, 95% CI: 0.42-1.32, p = 0.314), or installation rate (OR = 0.29, 95% CI: 0.01-2.35, p = 0.294). The performance expectancy (γ = 0.45, p < 0.001), trust (γ = 0.24, p = 0.002), and social influence (γ = 0.32, p = 0.008) were identified as the core determinants of acceptance explaining 68% of its variance. The SEM model fit was excellent (RMSEA = 0.06, SRMR = 0.05). The overall acceptance was M = 10.9 (SD = 3.73), with 35.41% of the participants showing a low, 47.92% a moderate, and 10.41% a high acceptance.

Discussion

The present AFI was not effective. The low to moderate acceptance of smart sensing poses a major barrier to its implementation. The performance expectancy, social influence, and trust should be targeted as the core factors of acceptance. Further studies are needed to identify effective ways to foster the acceptance of smart sensing and to develop successful implementation strategies.

Clinical Trial Registration

identifier 10.17605/OSF.IO/GJTPH.

Collapse

Painter A, Hayhoe B, Riboli-Sasco E, El-Osta A. Online Symptom Checkers: Recommendations for a Vignette-Based Clinical Evaluation Standard. J Med Internet Res 2022;24:e37408. [DOI: 10.2196/37408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Revised: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/13/2022] Open

Fraser HSF, Cohan G, Koehler C, Anderson J, Lawrence A, Pateña J, Bacher I, Ranney ML. Evaluation of Diagnostic and Triage Accuracy and Usability of a Symptom Checker in an Emergency Department: Observational Study. JMIR Mhealth Uhealth 2022;10:e38364. [PMID: 36121688 PMCID: PMC9531004 DOI: 10.2196/38364] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 05/31/2022] [Accepted: 06/10/2022] [Indexed: 11/26/2022] Open

Abstract

Background

Symptom checkers are clinical decision support apps for patients, used by tens of millions of people annually. They are designed to provide diagnostic and triage advice and assist users in seeking the appropriate level of care. Little evidence is available regarding their diagnostic and triage accuracy with direct use by patients for urgent conditions.

Objective

The aim of this study is to determine the diagnostic and triage accuracy and usability of a symptom checker in use by patients presenting to an emergency department (ED).

Methods

We recruited a convenience sample of English-speaking patients presenting for care in an urban ED. Each consenting patient used a leading symptom checker from Ada Health before the ED evaluation. Diagnostic accuracy was evaluated by comparing the symptom checker’s diagnoses and those of 3 independent emergency physicians viewing the patient-entered symptom data, with the final diagnoses from the ED evaluation. The Ada diagnoses and triage were also critiqued by the independent physicians. The patients completed a usability survey based on the Technology Acceptance Model.

Results

A total of 40 (80%) of the 50 participants approached completed the symptom checker assessment and usability survey. Their mean age was 39.3 (SD 15.9; range 18-76) years, and they were 65% (26/40) female, 68% (27/40) White, 48% (19/40) Hispanic or Latino, and 13% (5/40) Black or African American. Some cases had missing data or a lack of a clear ED diagnosis; 75% (30/40) were included in the analysis of diagnosis, and 93% (37/40) for triage. The sensitivity for at least one of the final ED diagnoses by Ada (based on its top 5 diagnoses) was 70% (95% CI 54%-86%), close to the mean sensitivity for the 3 physicians (on their top 3 diagnoses) of 68.9%. The physicians rated the Ada triage decisions as 62% (23/37) fully agree and 24% (9/37) safe but too cautious. It was rated as unsafe and too risky in 22% (8/37) of cases by at least one physician, in 14% (5/37) of cases by at least two physicians, and in 5% (2/37) of cases by all 3 physicians. Usability was rated highly; participants agreed or strongly agreed with the 7 Technology Acceptance Model usability questions with a mean score of 84.6%, although “satisfaction” and “enjoyment” were rated low.

Conclusions

This study provides preliminary evidence that a symptom checker can provide acceptable usability and diagnostic accuracy for patients with various urgent conditions. A total of 14% (5/37) of symptom checker triage recommendations were deemed unsafe and too risky by at least two physicians based on the symptoms recorded, similar to the results of studies on telephone and nurse triage. Larger studies are needed of diagnosis and triage performance with direct patient use in different clinical environments.

Collapse

Zielasek J, Reinhardt I, Schmidt L, Gouzoulis-Mayfrank E. Adapting and Implementing Apps for Mental Healthcare. Curr Psychiatry Rep 2022;24:407-417. [PMID: 35835898 PMCID: PMC9283030 DOI: 10.1007/s11920-022-01350-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/01/2022] [Indexed: 11/03/2022]

Schmieding ML, Kopka M, Schmidt K, Schulz-Niethammer S, Balzer F, Feufel MA. Triage Accuracy of Symptom Checker Apps: 5-Year Follow-up Evaluation. J Med Internet Res 2022;24:e31810. [PMID: 35536633 PMCID: PMC9131144 DOI: 10.2196/31810] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 11/19/2021] [Accepted: 01/30/2022] [Indexed: 12/16/2022] Open

Abstract

BACKGROUND

Symptom checkers are digital tools assisting laypersons in self-assessing the urgency and potential causes of their medical complaints. They are widely used but face concerns from both patients and health care professionals, especially regarding their accuracy. A 2015 landmark study substantiated these concerns using case vignettes to demonstrate that symptom checkers commonly err in their triage assessment.

OBJECTIVE

This study aims to revisit the landmark index study to investigate whether and how symptom checkers' capabilities have evolved since 2015 and how they currently compare with laypersons' stand-alone triage appraisal.

METHODS

In early 2020, we searched for smartphone and web-based applications providing triage advice. We evaluated these apps on the same 45 case vignettes as the index study. Using descriptive statistics, we compared our findings with those of the index study and with publicly available data on laypersons' triage capability.

RESULTS

We retrieved 22 symptom checkers providing triage advice. The median triage accuracy in 2020 (55.8%, IQR 15.1%) was close to that in 2015 (59.1%, IQR 15.5%). The apps in 2020 were less risk averse (odds 1.11:1, the ratio of overtriage errors to undertriage errors) than those in 2015 (odds 2.82:1), missing >40% of emergencies. Few apps outperformed laypersons in either deciding whether emergency care was required or whether self-care was sufficient. No apps outperformed the laypersons on both decisions.

CONCLUSIONS

Triage performance of symptom checkers has, on average, not improved over the course of 5 years. It decreased in 2 use cases (advice on when emergency care is required and when no health care is needed for the moment). However, triage capability varies widely within the sample of symptom checkers. Whether it is beneficial to seek advice from symptom checkers depends on the app chosen and on the specific question to be answered. Future research should develop resources (eg, case vignette repositories) to audit the capabilities of symptom checkers continuously and independently and provide guidance on when and to whom they should be recommended.

Collapse

Millen E, Salim N, Azadzoy H, Bane MM, O'Donnell L, Schmude M, Bode P, Tuerk E, Vaidya R, Gilbert SH. Study protocol for a pilot prospective, observational study investigating the condition suggestion and urgency advice accuracy of a symptom assessment app in sub-Saharan Africa: the AFYA-'Health' Study. BMJ Open 2022;12:e055915. [PMID: 35410928 PMCID: PMC9003603 DOI: 10.1136/bmjopen-2021-055915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open

Abstract

INTRODUCTION

Due to a global shortage of healthcare workers, there is a lack of basic healthcare for 4 billion people worldwide, particularly affecting low-income and middle-income countries. The utilisation of AI-based healthcare tools such as symptom assessment applications (SAAs) has the potential to reduce the burden on healthcare systems. The purpose of the AFYA Study (AI-based Assessment oF health sYmptoms in TAnzania) is to evaluate the accuracy of the condition suggestions and urgency advice provided by a user on a Swahili language Ada SAA.

METHODS AND ANALYSIS

This study is designed as an observational prospective clinical study. The setting is a waiting room of a Tanzanian district hospital. It will include patients entering the outpatient clinic with various conditions and age groups, including children and adolescents. Patients will be asked to use the SAA before proceeding to usual care. After usual care, they will have a consultation with a study-provided physician. Patients and healthcare practitioners will be blinded to the SAA's results. An expert panel will compare the Ada SAA's condition suggestions and urgency advice to usual care and study provided differential diagnoses and triage. The primary outcome measures are the accuracy and comprehensiveness of the Ada SAA evaluated against the gold standard differential diagnoses.

ETHICS AND DISSEMINATION

Ethical approval was received by the ethics committee (EC) of Muhimbili University of Health and Allied Sciences with an approval number MUHAS-REC-09-2019-044 and the National Institute for Medical Research, NIMR/HQ/R.8c/Vol. I/922. All amendments to the protocol are reported and adapted on the basis of the requirements of the EC. The results from this study will be submitted to peer-reviewed journals, local and international stakeholders, and will be communicated in editorials/articles by Ada Health.

TRIAL REGISTRATION NUMBER

NCT04958577.

Collapse