Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Nateqi J, Lin S, Krobath H, Gruarin S, Lutz T, Dvorak T, Gruschina A, Ortner R. [From symptom to diagnosis-symptom checkers re-evaluated : Are symptom checkers finally sufficient and accurate to use? An update from the ENT perspective]. HNO 2019;67:334-342. [PMID: 30993374 DOI: 10.1007/s00106-019-0666-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

For:	Nateqi J, Lin S, Krobath H, Gruarin S, Lutz T, Dvorak T, Gruschina A, Ortner R. [From symptom to diagnosis-symptom checkers re-evaluated : Are symptom checkers finally sufficient and accurate to use? An update from the ENT perspective]. HNO 2019;67:334-342. [PMID: 30993374 DOI: 10.1007/s00106-019-0666-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Number

Cited by Other Article(s)

Hammoud M, Douglas S, Darmach M, Alawneh S, Sanyal S, Kanbour Y. Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study. JMIR AI 2024;3:e46875. [PMID: 38875676 PMCID: PMC11091811 DOI: 10.2196/46875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/15/2023] [Accepted: 03/02/2024] [Indexed: 06/16/2024]

Abstract

BACKGROUND

Medical self-diagnostic tools (or symptom checkers) are becoming an integral part of digital health and our daily lives, whereby patients are increasingly using them to identify the underlying causes of their symptoms. As such, it is essential to rigorously investigate and comprehensively report the diagnostic performance of symptom checkers using standard clinical and scientific approaches.

OBJECTIVE

This study aims to evaluate and report the accuracies of a few known and new symptom checkers using a standard and transparent methodology, which allows the scientific community to cross-validate and reproduce the reported results, a step much needed in health informatics.

METHODS

We propose a 4-stage experimentation methodology that capitalizes on the standard clinical vignette approach to evaluate 6 symptom checkers. To this end, we developed and peer-reviewed 400 vignettes, each approved by at least 5 out of 7 independent and experienced primary care physicians. To establish a frame of reference and interpret the results of symptom checkers accordingly, we further compared the best-performing symptom checker against 3 primary care physicians with an average experience of 16.6 (SD 9.42) years. To measure accuracy, we used 7 standard metrics, including M1 as a measure of a symptom checker's or a physician's ability to return a vignette's main diagnosis at the top of their differential list, F1-score as a trade-off measure between recall and precision, and Normalized Discounted Cumulative Gain (NDCG) as a measure of a differential list's ranking quality, among others.

RESULTS

The diagnostic accuracies of the 6 tested symptom checkers vary significantly. For instance, the differences in the M1, F1-score, and NDCG results between the best-performing and worst-performing symptom checkers or ranges were 65.3%, 39.2%, and 74.2%, respectively. The same was observed among the participating human physicians, whereby the M1, F1-score, and NDCG ranges were 22.8%, 15.3%, and 21.3%, respectively. When compared against each other, physicians outperformed the best-performing symptom checker by an average of 1.2% using F1-score, whereas the best-performing symptom checker outperformed physicians by averages of 10.2% and 25.1% using M1 and NDCG, respectively.

CONCLUSIONS

The performance variation between symptom checkers is substantial, suggesting that symptom checkers cannot be treated as a single entity. On a different note, the best-performing symptom checker was an artificial intelligence (AI)-based one, shedding light on the promise of AI in improving the diagnostic capabilities of symptom checkers, especially as AI keeps advancing exponentially.

Collapse

Šafran V, Lin S, Nateqi J, Martin AG, Smrke U, Ariöz U, Plohl N, Rojc M, Bēma D, Chávez M, Horvat M, Mlakar I. Multilingual Framework for Risk Assessment and Symptom Tracking (MRAST). SENSORS (BASEL, SWITZERLAND) 2024;24:1101. [PMID: 38400259 PMCID: PMC10892413 DOI: 10.3390/s24041101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/02/2024] [Accepted: 02/06/2024] [Indexed: 02/25/2024]

Abstract

The importance and value of real-world data in healthcare cannot be overstated because it offers a valuable source of insights into patient experiences. Traditional patient-reported experience and outcomes measures (PREMs/PROMs) often fall short in addressing the complexities of these experiences due to subjectivity and their inability to precisely target the questions asked. In contrast, diary recordings offer a promising solution. They can provide a comprehensive picture of psychological well-being, encompassing both psychological and physiological symptoms. This study explores how using advanced digital technologies, i.e., automatic speech recognition and natural language processing, can efficiently capture patient insights in oncology settings. We introduce the MRAST framework, a simplified way to collect, structure, and understand patient data using questionnaires and diary recordings. The framework was validated in a prospective study with 81 colorectal and 85 breast cancer survivors, of whom 37 were male and 129 were female. Overall, the patients evaluated the solution as well made; they found it easy to use and integrate into their daily routine. The majority (75.3%) of the cancer survivors participating in the study were willing to engage in health monitoring activities using digital wearable devices daily for an extended period. Throughout the study, there was a noticeable increase in the number of participants who perceived the system as having excellent usability. Despite some negative feedback, 44.44% of patients still rated the app's usability as above satisfactory (i.e., 7.9 on 1-10 scale) and the experience with diary recording as above satisfactory (i.e., 7.0 on 1-10 scale). Overall, these findings also underscore the significance of user testing and continuous improvement in enhancing the usability and user acceptance of solutions like the MRAST framework. Overall, the automated extraction of information from diaries represents a pivotal step toward a more patient-centered approach, where healthcare decisions are based on real-world experiences and tailored to individual needs. The potential usefulness of such data is enormous, as it enables better measurement of everyday experiences and opens new avenues for patient-centered care.

Collapse

Sellin J, Pantel JT, Börsch N, Conrad R, Mücke M. [Short paths to diagnosis with artificial intelligence: systematic literature review on diagnostic decision support systems]. Schmerz 2024;38:19-27. [PMID: 38165492 DOI: 10.1007/s00482-023-00777-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2023] [Indexed: 01/03/2024]

Kafke SD, Kuhlmey A, Schuster J, Blüher S, Czimmeck C, Zoellick JC, Grosse P. Can clinical decision support systems be an asset in medical education? An experimental approach. BMC MEDICAL EDUCATION 2023;23:570. [PMID: 37568144 PMCID: PMC10416486 DOI: 10.1186/s12909-023-04568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 08/04/2023] [Indexed: 08/13/2023]

Abstract

BACKGROUND

Diagnostic accuracy is one of the major cornerstones of appropriate and successful medical decision-making. Clinical decision support systems (CDSSs) have recently been used to facilitate physician's diagnostic considerations. However, to date, little is known about the potential assets of CDSS for medical students in an educational setting. The purpose of our study was to explore the usefulness of CDSSs for medical students assessing their diagnostic performances and the influence of such software on students' trust in their own diagnostic abilities.

METHODS

Based on paper cases students had to diagnose two different patients using a CDSS and conventional methods such as e.g. textbooks, respectively. Both patients had a common disease, in one setting the clinical presentation was a typical one (tonsillitis), in the other setting (pulmonary embolism), however, the patient presented atypically. We used a 2x2x2 between- and within-subjects cluster-randomised controlled trial to assess the diagnostic accuracy in medical students, also by changing the order of the used resources (CDSS first or second).

RESULTS

Medical students in their 4th and 5th year performed equally well using conventional methods or the CDSS across the two cases (t(164) = 1,30; p = 0.197). Diagnostic accuracy and trust in the correct diagnosis were higher in the typical presentation condition than in the atypical presentation condition (t(85) = 19.97; p < .0001 and t(150) = 7.67; p < .0001).These results refute our main hypothesis that students diagnose more accurately when using conventional methods compared to the CDSS.

CONCLUSIONS

Medical students in their 4th and 5th year performed equally well in diagnosing two cases of common diseases with typical or atypical clinical presentations using conventional methods or a CDSS. Students were proficient in diagnosing a common disease with a typical presentation but underestimated their own factual knowledge in this scenario. Also, students were aware of their own diagnostic limitations when presented with a challenging case with an atypical presentation for which the use of a CDSS seemingly provided no additional insights.

Collapse

Kopka M, Feufel MA, Berner ES, Schmieding ML. How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective. Digit Health 2023;9:20552076231194929. [PMID: 37614591 PMCID: PMC10444026 DOI: 10.1177/20552076231194929] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 07/28/2023] [Indexed: 08/25/2023] Open

Abstract

Objective

To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies.

Methods

We re-analyzed the publicly available data of two prominent case vignette-based symptom checker audit studies by calculating common metrics of test theory. Furthermore, we developed a new metric, the Capability Comparison Score (CCS), which compares symptom checker capability while controlling for the difficulty of the set of cases each symptom checker evaluated. We then scrutinized whether applying test theory and the CCS altered the performance ranking of the investigated symptom checkers.

Results

In both studies, most symptom checkers changed their rank order when adjusting the triage capability for item difficulty (ID) with the CCS. The previously reported triage accuracies commonly overestimated the capability of symptom checkers because they did not account for the fact that symptom checkers tend to selectively appraise easier cases (i.e., with high ID values). Also, many case vignettes in both studies showed insufficient (very low and even negative) values of item-total correlation (ITC), suggesting that individual items or the composition of item sets are of low quality.

Conclusions

A test-theoretic perspective helps identify previously undetected threats to the validity of case vignette-based symptom checker assessments and provides guidance and specific metrics to improve the quality of case vignettes, in particular by controlling for the difficulty of the vignettes an app was (not) able to evaluate correctly. Such measures might prove more meaningful than accuracy alone for the competitive assessment of symptom checkers. Our approach helps elaborate and standardize the methodology used for appraising symptom checker capability, which, ultimately, may yield more reliable results.

Collapse

Lin S, Nateqi J, Weingartner-Ortner R, Gruarin S, Marling H, Pilgram V, Lagler FB, Aigner E, Martin AG. An artificial intelligence-based approach for identifying rare disease patients using retrospective electronic health records applied for Pompe disease. Front Neurol 2023;14:1108222. [PMID: 37153672 PMCID: PMC10160659 DOI: 10.3389/fneur.2023.1108222] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 04/03/2023] [Indexed: 05/10/2023] Open

Abstract

Objective

We retrospectively screened 350,116 electronic health records (EHRs) to identify suspected patients for Pompe disease. Using these suspected patients, we then describe their phenotypical characteristics and estimate the prevalence in the respective population covered by the EHRs.

Methods

We applied Symptoma's Artificial Intelligence-based approach for identifying rare disease patients to retrospective anonymized EHRs provided by the "University Hospital Salzburg" clinic group. Within 1 month, the AI screened 350,116 EHRs reaching back 15 years from five hospitals, and 104 patients were flagged as probable for Pompe disease. Flagged patients were manually reviewed and assessed by generalist and specialist physicians for their likelihood for Pompe disease, from which the performance of the algorithms was evaluated.

Results

Of the 104 patients flagged by the algorithms, generalist physicians found five "diagnosed," 10 "suspected," and seven patients with "reduced suspicion." After feedback from Pompe disease specialist physicians, 19 patients remained clinically plausible for Pompe disease, resulting in a specificity of 18.27% for the AI. Estimating from the remaining plausible patients, the prevalence of Pompe disease for the greater Salzburg region [incl. Bavaria (Germany), Styria (Austria), and Upper Austria (Austria)] was one in every 18,427 people. Phenotypes for patient cohorts with an approximated onset of symptoms above or below 1 year of age were established, which correspond to infantile-onset Pompe disease (IOPD) and late-onset Pompe disease (LOPD), respectively.

Conclusion

Our study shows the feasibility of Symptoma's AI-based approach for identifying rare disease patients using retrospective EHRs. Via the algorithm's screening of an entire EHR population, a physician had only to manually review 5.47 patients on average to find one suspected candidate. This efficiency is crucial as Pompe disease, while rare, is a progressively debilitating but treatable neuromuscular disease. As such, we demonstrated both the efficiency of the approach and the potential of a scalable solution to the systematic identification of rare disease patients. Thus, similar implementation of this methodology should be encouraged to improve care for all rare disease patients.

Collapse

Schmude M, Salim N, Azadzoy H, Bane M, Millen E, O'Donnell L, Bode P, Türk E, Vaidya R, Gilbert S. Investigating the Potential for Clinical Decision Support in Sub-Saharan Africa With AFYA (Artificial Intelligence-Based Assessment of Health Symptoms in Tanzania): Protocol for a Prospective, Observational Pilot Study. JMIR Res Protoc 2022;11:e34298. [PMID: 35671073 PMCID: PMC9214611 DOI: 10.2196/34298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 02/17/2022] [Accepted: 04/30/2022] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

Low- and middle-income countries face difficulties in providing adequate health care. One of the reasons is a shortage of qualified health workers. Diagnostic decision support systems are designed to aid clinicians in their work and have the potential to mitigate pressure on health care systems.

OBJECTIVE

The Artificial Intelligence-Based Assessment of Health Symptoms in Tanzania (AFYA) study will evaluate the potential of an English-language artificial intelligence-based prototype diagnostic decision support system for mid-level health care practitioners in a low- or middle-income setting.

METHODS

This is an observational, prospective clinical study conducted in a busy Tanzanian district hospital. In addition to usual care visits, study participants will consult a mid-level health care practitioner, who will use a prototype diagnostic decision support system, and a study physician. The accuracy and comprehensiveness of the differential diagnosis provided by the diagnostic decision support system will be evaluated against a gold-standard differential diagnosis provided by an expert panel.

RESULTS

Patient recruitment started in October 2021. Participants were recruited directly in the waiting room of the outpatient clinic at the hospital. Data collection will conclude in May 2022. Data analysis is planned to be finished by the end of June 2022. The results will be published in a peer-reviewed journal.

CONCLUSIONS

Most diagnostic decision support systems have been developed and evaluated in high-income countries, but there is great potential for these systems to improve the delivery of health care in low- and middle-income countries. The findings of this real-patient study will provide insights based on the performance and usability of a prototype diagnostic decision support system in low- or middle-income countries.

TRIAL REGISTRATION

ClinicalTrials.gov NCT04958577; http://clinicaltrials.gov/ct2/show/NCT04958577.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID)

DERR1-10.2196/34298.

Collapse

Symptoms associated with a COVID-19 infection among a non-hospitalized cohort in Vienna. Wien Klin Wochenschr 2022;134:344-350. [PMID: 35416543 PMCID: PMC9007045 DOI: 10.1007/s00508-022-02028-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/13/2022] [Indexed: 11/16/2022]

Millen E, Salim N, Azadzoy H, Bane MM, O'Donnell L, Schmude M, Bode P, Tuerk E, Vaidya R, Gilbert SH. Study protocol for a pilot prospective, observational study investigating the condition suggestion and urgency advice accuracy of a symptom assessment app in sub-Saharan Africa: the AFYA-'Health' Study. BMJ Open 2022;12:e055915. [PMID: 35410928 PMCID: PMC9003603 DOI: 10.1136/bmjopen-2021-055915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open

Abstract

INTRODUCTION

Due to a global shortage of healthcare workers, there is a lack of basic healthcare for 4 billion people worldwide, particularly affecting low-income and middle-income countries. The utilisation of AI-based healthcare tools such as symptom assessment applications (SAAs) has the potential to reduce the burden on healthcare systems. The purpose of the AFYA Study (AI-based Assessment oF health sYmptoms in TAnzania) is to evaluate the accuracy of the condition suggestions and urgency advice provided by a user on a Swahili language Ada SAA.

METHODS AND ANALYSIS

This study is designed as an observational prospective clinical study. The setting is a waiting room of a Tanzanian district hospital. It will include patients entering the outpatient clinic with various conditions and age groups, including children and adolescents. Patients will be asked to use the SAA before proceeding to usual care. After usual care, they will have a consultation with a study-provided physician. Patients and healthcare practitioners will be blinded to the SAA's results. An expert panel will compare the Ada SAA's condition suggestions and urgency advice to usual care and study provided differential diagnoses and triage. The primary outcome measures are the accuracy and comprehensiveness of the Ada SAA evaluated against the gold standard differential diagnoses.

ETHICS AND DISSEMINATION

Ethical approval was received by the ethics committee (EC) of Muhimbili University of Health and Allied Sciences with an approval number MUHAS-REC-09-2019-044 and the National Institute for Medical Research, NIMR/HQ/R.8c/Vol. I/922. All amendments to the protocol are reported and adapted on the basis of the requirements of the EC. The results from this study will be submitted to peer-reviewed journals, local and international stakeholders, and will be communicated in editorials/articles by Ada Health.

TRIAL REGISTRATION NUMBER

NCT04958577.

Collapse

Gilbert S, Fenech M, Upadhyay S, Wicks P, Novorol C. Quality of condition suggestions and urgency advice provided by the Ada symptom assessment app evaluated with vignettes optimised for Australia. Aust J Prim Health 2021;27:377-381. [PMID: 34706813 DOI: 10.1071/py21032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 05/11/2021] [Indexed: 11/23/2022]

Munsch N, Martin A, Gruarin S, Nateqi J, Abdarahmane I, Weingartner-Ortner R, Knapp B. Authors' Reply to: Screening Tools: Their Intended Audiences and Purposes. Comment on "Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study". J Med Internet Res 2021;23:e26543. [PMID: 33989162 PMCID: PMC8178729 DOI: 10.2196/26543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 05/13/2021] [Indexed: 11/24/2022] Open

Plontke SK. Rare Diseases and Otorhinolaryngology, Head and Neck Surgery. Laryngorhinootologie 2021;100:S1-S11. [PMID: 34352898 PMCID: PMC8354574 DOI: 10.1055/a-1397-0842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Gilbert S, Mehl A, Baluch A, Cawley C, Challiner J, Fraser H, Millen E, Montazeri M, Multmeier J, Pick F, Richter C, Türk E, Upadhyay S, Virani V, Vona N, Wicks P, Novorol C. How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs. BMJ Open 2020;10:e040269. [PMID: 33328258 PMCID: PMC7745523 DOI: 10.1136/bmjopen-2020-040269] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Abstract

OBJECTIVES

To compare breadth of condition coverage, accuracy of suggested conditions and appropriateness of urgency advice of eight popular symptom assessment apps.

DESIGN

Vignettes study.

SETTING

200 primary care vignettes.

INTERVENTION/COMPARATOR

For eight apps and seven general practitioners (GPs): breadth of coverage and condition-suggestion and urgency advice accuracy measured against the vignettes' gold-standard.

PRIMARY OUTCOME MEASURES

(1) Proportion of conditions 'covered' by an app, that is, not excluded because the user was too young/old or pregnant, or not modelled; (2) proportion of vignettes with the correct primary diagnosis among the top 3 conditions suggested; (3) proportion of 'safe' urgency advice (ie, at gold standard level, more conservative, or no more than one level less conservative).

RESULTS

Condition-suggestion coverage was highly variable, with some apps not offering a suggestion for many users: in alphabetical order, Ada: 99.0%; Babylon: 51.5%; Buoy: 88.5%; K Health: 74.5%; Mediktor: 80.5%; Symptomate: 61.5%; Your.MD: 64.5%; WebMD: 93.0%. Top-3 suggestion accuracy was GPs (average): 82.1%±5.2%; Ada: 70.5%; Babylon: 32.0%; Buoy: 43.0%; K Health: 36.0%; Mediktor: 36.0%; Symptomate: 27.5%; WebMD: 35.5%; Your.MD: 23.5%. Some apps excluded certain user demographics or conditions and their performance was generally greater with the exclusion of corresponding vignettes. For safe urgency advice, tested GPs had an average of 97.0%±2.5%. For the vignettes with advice provided, only three apps had safety performance within 1 SD of the GPs-Ada: 97.0%; Babylon: 95.1%; Symptomate: 97.8%. One app had a safety performance within 2 SDs of GPs-Your.MD: 92.6%. Three apps had a safety performance outside 2 SDs of GPs-Buoy: 80.0% (p<0.001); K Health: 81.3% (p<0.001); Mediktor: 87.3% (p=1.3×10^-3).

CONCLUSIONS

The utility of digital symptom assessment apps relies on coverage, accuracy and safety. While no digital tool outperformed GPs, some came close, and the nature of iterative improvements to software offers scalable improvements to care.

Collapse

Martin A, Nateqi J, Gruarin S, Munsch N, Abdarahmane I, Zobel M, Knapp B. An artificial intelligence-based first-line defence against COVID-19: digitally screening citizens for risks via a chatbot. Sci Rep 2020;10:19012. [PMID: 33149198 PMCID: PMC7643065 DOI: 10.1038/s41598-020-75912-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 10/05/2020] [Indexed: 12/24/2022] Open

Mehl A, Bergey F, Cawley C, Gilsdorf A. Syndromic Surveillance Insights from a Symptom Assessment App Before and During COVID-19 Measures in Germany and the United Kingdom: Results From Repeated Cross-Sectional Analyses. JMIR Mhealth Uhealth 2020;8:e21364. [PMID: 32997640 PMCID: PMC7561445 DOI: 10.2196/21364] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 07/15/2020] [Accepted: 09/15/2020] [Indexed: 12/23/2022] Open

Abstract

Background

Unprecedented lockdown measures have been introduced in countries worldwide to mitigate the spread and consequences of COVID-19. Although attention has been focused on the effects of these measures on epidemiological indicators relating directly to the infection, there is increased recognition of their broader health implications. However, assessing these implications in real time is a challenge, due to the limitations of existing syndromic surveillance data and tools.

Objective

The aim of this study is to explore the added value of mobile phone app–based symptom assessment tools as real-time health insight providers to inform public health policy makers.

Methods

A comparative and descriptive analysis of the proportion of all self-reported symptoms entered by users during an assessment within the Ada app in Germany and the United Kingdom was conducted between two periods, namely before and after the implementation of “Phase One” COVID-19 measures. Additional analyses were performed to explore the association between symptom trends and seasonality, and symptom trends and weather. Differences in the proportion of unique symptoms between the periods were analyzed using a Pearson chi-square test and reported as log2 fold changes.

Results

Overall, 48,300-54,900 symptomatic users reported 140,500-170,400 symptoms during the Baseline and Measures periods in Germany. Overall, 34,200-37,400 symptomatic users in the United Kingdom reported 112,100-131,900 symptoms during the Baseline and Measures periods. The majority of symptomatic users were female (Germany: 68,600/103,200, 66.52%; United Kingdom: 51,200/71,600, 72.74%). The majority were aged 10-29 years (Germany: 68,500/100,000, 68.45%; United Kingdom: 50,900/68,800, 73.91%), and about one-quarter were aged 30-59 years (Germany: 26,200/100,000, 26.15%; United Kingdom: 14,900/68,800, 21.65%). Overall, 103 symptoms were reported either more or less frequently (with statistically significant differences) during the Measures period as compared to the Baseline period, and 34 of these were reported in both countries. The following mental health symptoms (log2 fold change, P value) were reported less often during the Measures period: inability to manage constant stress and demands at work (–1.07, P<.001), memory difficulty (–0.56, P<.001), depressed mood (–0.42, P<.001), and impaired concentration (–0.46, P<.001). Diminished sense of taste (2.26, P<.001) and hyposmia (2.20, P<.001) were reported more frequently during the Measures period. None of the 34 symptoms were found to be different between the same dates in 2019. In total, 14 of the 34 symptoms had statistically significant associations with weather variables.

Conclusions

Symptom assessment apps have an important role to play in facilitating improved understanding of the implications of public health policies such as COVID-19 lockdown measures. Not only do they provide the means to complement and cross-validate hypotheses based on data collected through more traditional channels, they can also generate novel insights through a real-time syndromic surveillance system.

Collapse

Munsch N, Martin A, Gruarin S, Nateqi J, Abdarahmane I, Weingartner-Ortner R, Knapp B. Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study. J Med Internet Res 2020;22:e21299. [PMID: 33001828 PMCID: PMC7541039 DOI: 10.2196/21299] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/27/2020] [Accepted: 09/14/2020] [Indexed: 01/06/2023] Open

Abstract

Background

A large number of web-based COVID-19 symptom checkers and chatbots have been developed; however, anecdotal evidence suggests that their conclusions are highly variable. To our knowledge, no study has evaluated the accuracy of COVID-19 symptom checkers in a statistically rigorous manner.

Objective

The aim of this study is to evaluate and compare the diagnostic accuracies of web-based COVID-19 symptom checkers.

Methods

We identified 10 web-based COVID-19 symptom checkers, all of which were included in the study. We evaluated the COVID-19 symptom checkers by assessing 50 COVID-19 case reports alongside 410 non–COVID-19 control cases. A bootstrapping method was used to counter the unbalanced sample sizes and obtain confidence intervals (CIs). Results are reported as sensitivity, specificity, F1 score, and Matthews correlation coefficient (MCC).

Results

The classification task between COVID-19–positive and COVID-19–negative for “high risk” cases among the 460 test cases yielded (sorted by F1 score): Symptoma (F1=0.92, MCC=0.85), Infermedica (F1=0.80, MCC=0.61), US Centers for Disease Control and Prevention (CDC) (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Cleveland Clinic (F1=0.40, MCC=0.07), Providence (F1=0.40, MCC=0.05), Apple (F1=0.29, MCC=-0.10), Docyet (F1=0.27, MCC=0.29), Ada (F1=0.24, MCC=0.27) and Your.MD (F1=0.24, MCC=0.27). For “high risk” and “medium risk” combined the performance was: Symptoma (F1=0.91, MCC=0.83) Infermedica (F1=0.80, MCC=0.61), Cleveland Clinic (F1=0.76, MCC=0.47), Providence (F1=0.75, MCC=0.45), Your.MD (F1=0.72, MCC=0.33), CDC (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Apple (F1=0.70, MCC=0.25), Ada (F1=0.42, MCC=0.03), and Docyet (F1=0.27, MCC=0.29).

Conclusions

We found that the number of correctly assessed COVID-19 and control cases varies considerably between symptom checkers, with different symptom checkers showing different strengths with respect to sensitivity and specificity. A good balance between sensitivity and specificity was only achieved by two symptom checkers.

Collapse

Rebitschek FG, Gigerenzer G. [Assessing the quality of digital health services: How can informed decisions be promoted?]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2020;63:665-673. [PMID: 32424555 DOI: 10.1007/s00103-020-03146-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]