1
|
Hammoud M, Douglas S, Darmach M, Alawneh S, Sanyal S, Kanbour Y. Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study. JMIR AI 2024; 3:e46875. [PMID: 38875676 PMCID: PMC11091811 DOI: 10.2196/46875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/15/2023] [Accepted: 03/02/2024] [Indexed: 06/16/2024]
Abstract
BACKGROUND Medical self-diagnostic tools (or symptom checkers) are becoming an integral part of digital health and our daily lives, whereby patients are increasingly using them to identify the underlying causes of their symptoms. As such, it is essential to rigorously investigate and comprehensively report the diagnostic performance of symptom checkers using standard clinical and scientific approaches. OBJECTIVE This study aims to evaluate and report the accuracies of a few known and new symptom checkers using a standard and transparent methodology, which allows the scientific community to cross-validate and reproduce the reported results, a step much needed in health informatics. METHODS We propose a 4-stage experimentation methodology that capitalizes on the standard clinical vignette approach to evaluate 6 symptom checkers. To this end, we developed and peer-reviewed 400 vignettes, each approved by at least 5 out of 7 independent and experienced primary care physicians. To establish a frame of reference and interpret the results of symptom checkers accordingly, we further compared the best-performing symptom checker against 3 primary care physicians with an average experience of 16.6 (SD 9.42) years. To measure accuracy, we used 7 standard metrics, including M1 as a measure of a symptom checker's or a physician's ability to return a vignette's main diagnosis at the top of their differential list, F1-score as a trade-off measure between recall and precision, and Normalized Discounted Cumulative Gain (NDCG) as a measure of a differential list's ranking quality, among others. RESULTS The diagnostic accuracies of the 6 tested symptom checkers vary significantly. For instance, the differences in the M1, F1-score, and NDCG results between the best-performing and worst-performing symptom checkers or ranges were 65.3%, 39.2%, and 74.2%, respectively. The same was observed among the participating human physicians, whereby the M1, F1-score, and NDCG ranges were 22.8%, 15.3%, and 21.3%, respectively. When compared against each other, physicians outperformed the best-performing symptom checker by an average of 1.2% using F1-score, whereas the best-performing symptom checker outperformed physicians by averages of 10.2% and 25.1% using M1 and NDCG, respectively. CONCLUSIONS The performance variation between symptom checkers is substantial, suggesting that symptom checkers cannot be treated as a single entity. On a different note, the best-performing symptom checker was an artificial intelligence (AI)-based one, shedding light on the promise of AI in improving the diagnostic capabilities of symptom checkers, especially as AI keeps advancing exponentially.
Collapse
|
2
|
Šafran V, Lin S, Nateqi J, Martin AG, Smrke U, Ariöz U, Plohl N, Rojc M, Bēma D, Chávez M, Horvat M, Mlakar I. Multilingual Framework for Risk Assessment and Symptom Tracking (MRAST). SENSORS (BASEL, SWITZERLAND) 2024; 24:1101. [PMID: 38400259 PMCID: PMC10892413 DOI: 10.3390/s24041101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 02/02/2024] [Accepted: 02/06/2024] [Indexed: 02/25/2024]
Abstract
The importance and value of real-world data in healthcare cannot be overstated because it offers a valuable source of insights into patient experiences. Traditional patient-reported experience and outcomes measures (PREMs/PROMs) often fall short in addressing the complexities of these experiences due to subjectivity and their inability to precisely target the questions asked. In contrast, diary recordings offer a promising solution. They can provide a comprehensive picture of psychological well-being, encompassing both psychological and physiological symptoms. This study explores how using advanced digital technologies, i.e., automatic speech recognition and natural language processing, can efficiently capture patient insights in oncology settings. We introduce the MRAST framework, a simplified way to collect, structure, and understand patient data using questionnaires and diary recordings. The framework was validated in a prospective study with 81 colorectal and 85 breast cancer survivors, of whom 37 were male and 129 were female. Overall, the patients evaluated the solution as well made; they found it easy to use and integrate into their daily routine. The majority (75.3%) of the cancer survivors participating in the study were willing to engage in health monitoring activities using digital wearable devices daily for an extended period. Throughout the study, there was a noticeable increase in the number of participants who perceived the system as having excellent usability. Despite some negative feedback, 44.44% of patients still rated the app's usability as above satisfactory (i.e., 7.9 on 1-10 scale) and the experience with diary recording as above satisfactory (i.e., 7.0 on 1-10 scale). Overall, these findings also underscore the significance of user testing and continuous improvement in enhancing the usability and user acceptance of solutions like the MRAST framework. Overall, the automated extraction of information from diaries represents a pivotal step toward a more patient-centered approach, where healthcare decisions are based on real-world experiences and tailored to individual needs. The potential usefulness of such data is enormous, as it enables better measurement of everyday experiences and opens new avenues for patient-centered care.
Collapse
Affiliation(s)
- Valentino Šafran
- Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia; (V.Š.); (U.S.); (U.A.); (M.R.)
| | - Simon Lin
- Science Department, Symptoma GmbH, 1030 Vienna, Austria (A.G.M.)
- Department of Internal Medicine, Paracelsus Medical University, 5020 Salzburg, Austria
| | - Jama Nateqi
- Science Department, Symptoma GmbH, 1030 Vienna, Austria (A.G.M.)
- Department of Internal Medicine, Paracelsus Medical University, 5020 Salzburg, Austria
| | | | - Urška Smrke
- Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia; (V.Š.); (U.S.); (U.A.); (M.R.)
| | - Umut Ariöz
- Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia; (V.Š.); (U.S.); (U.A.); (M.R.)
| | - Nejc Plohl
- Department of Psychology, Faculty of Arts, University of Maribor, 2000 Maribor, Slovenia;
| | - Matej Rojc
- Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia; (V.Š.); (U.S.); (U.A.); (M.R.)
| | - Dina Bēma
- Institute of Clinical and Preventive Medicine, University of Latvia, LV-1586 Riga, Latvia;
| | - Marcela Chávez
- Department of Information System Management, Centre Hospitalier Universitaire de Liège, 4000 Liège, Belgium;
| | - Matej Horvat
- Department of Oncology, University Medical Centre Maribor, 2000 Maribor, Slovenia;
| | - Izidor Mlakar
- Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia; (V.Š.); (U.S.); (U.A.); (M.R.)
| |
Collapse
|
3
|
Sellin J, Pantel JT, Börsch N, Conrad R, Mücke M. [Short paths to diagnosis with artificial intelligence: systematic literature review on diagnostic decision support systems]. Schmerz 2024; 38:19-27. [PMID: 38165492 DOI: 10.1007/s00482-023-00777-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND Rare diseases are often recognized late. Their diagnosis is particularly challenging due to the diversity, complexity and heterogeneity of clinical symptoms. Computer-aided diagnostic aids, often referred to as diagnostic decision support systems (DDSS), are promising tools for shortening the time to diagnosis. Despite initial positive evaluations, DDSS are not yet widely used, partly due to a lack of integration with existing clinical or practice information systems. OBJECTIVE This article provides an insight into currently existing diagnostic support systems that function without access to electronic patient records and only require information that is easily obtainable. MATERIALS AND METHODS A systematic literature search identified eight articles on DDSS that can assist in the diagnosis of rare diseases with no need for access to electronic patient records or other information systems in practices and hospitals. The main advantages and disadvantages of the identified rare disease diagnostic support systems were extracted and summarized. RESULTS Symptom checkers and DDSS based on portrait photos and pain drawings already exist. The degree of maturity of these applications varies. CONCLUSION DDSS currently still face a number of challenges, such as concerns about data protection and accuracy, and acceptance and awareness continue to be rather low. On the other hand, there is great potential for faster diagnosis, especially for rare diseases, which are easily overlooked due to their large number and the low awareness of them. The use of DDSS should therefore be carefully considered by doctors on a case-by-case basis.
Collapse
Affiliation(s)
- Julia Sellin
- Institut für Digitale Allgemeinmedizin, Universitätsklinikum RWTH Aachen, Aachen, Deutschland.
- Zentrum für Seltene Erkrankungen Aachen (ZSEA), Universitätsklinikum RWTH Aachen, Aachen, Deutschland.
| | - Jean Tori Pantel
- Institut für Digitale Allgemeinmedizin, Universitätsklinikum RWTH Aachen, Aachen, Deutschland
- Zentrum für Seltene Erkrankungen Aachen (ZSEA), Universitätsklinikum RWTH Aachen, Aachen, Deutschland
| | - Natalie Börsch
- Institut für Digitale Allgemeinmedizin, Universitätsklinikum RWTH Aachen, Aachen, Deutschland
- Zentrum für Seltene Erkrankungen Aachen (ZSEA), Universitätsklinikum RWTH Aachen, Aachen, Deutschland
| | - Rupert Conrad
- Klinik für Psychosomatische Medizin und Psychotherapie, Universitätsklinikum Münster, Münster, Deutschland
| | - Martin Mücke
- Institut für Digitale Allgemeinmedizin, Universitätsklinikum RWTH Aachen, Aachen, Deutschland
- Zentrum für Seltene Erkrankungen Aachen (ZSEA), Universitätsklinikum RWTH Aachen, Aachen, Deutschland
| |
Collapse
|
4
|
Kafke SD, Kuhlmey A, Schuster J, Blüher S, Czimmeck C, Zoellick JC, Grosse P. Can clinical decision support systems be an asset in medical education? An experimental approach. BMC MEDICAL EDUCATION 2023; 23:570. [PMID: 37568144 PMCID: PMC10416486 DOI: 10.1186/s12909-023-04568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 08/04/2023] [Indexed: 08/13/2023]
Abstract
BACKGROUND Diagnostic accuracy is one of the major cornerstones of appropriate and successful medical decision-making. Clinical decision support systems (CDSSs) have recently been used to facilitate physician's diagnostic considerations. However, to date, little is known about the potential assets of CDSS for medical students in an educational setting. The purpose of our study was to explore the usefulness of CDSSs for medical students assessing their diagnostic performances and the influence of such software on students' trust in their own diagnostic abilities. METHODS Based on paper cases students had to diagnose two different patients using a CDSS and conventional methods such as e.g. textbooks, respectively. Both patients had a common disease, in one setting the clinical presentation was a typical one (tonsillitis), in the other setting (pulmonary embolism), however, the patient presented atypically. We used a 2x2x2 between- and within-subjects cluster-randomised controlled trial to assess the diagnostic accuracy in medical students, also by changing the order of the used resources (CDSS first or second). RESULTS Medical students in their 4th and 5th year performed equally well using conventional methods or the CDSS across the two cases (t(164) = 1,30; p = 0.197). Diagnostic accuracy and trust in the correct diagnosis were higher in the typical presentation condition than in the atypical presentation condition (t(85) = 19.97; p < .0001 and t(150) = 7.67; p < .0001).These results refute our main hypothesis that students diagnose more accurately when using conventional methods compared to the CDSS. CONCLUSIONS Medical students in their 4th and 5th year performed equally well in diagnosing two cases of common diseases with typical or atypical clinical presentations using conventional methods or a CDSS. Students were proficient in diagnosing a common disease with a typical presentation but underestimated their own factual knowledge in this scenario. Also, students were aware of their own diagnostic limitations when presented with a challenging case with an atypical presentation for which the use of a CDSS seemingly provided no additional insights.
Collapse
Affiliation(s)
- Sean D Kafke
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.
| | - Adelheid Kuhlmey
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Johanna Schuster
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Stefan Blüher
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Constanze Czimmeck
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Jan C Zoellick
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Pascal Grosse
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
5
|
Kopka M, Feufel MA, Berner ES, Schmieding ML. How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective. Digit Health 2023; 9:20552076231194929. [PMID: 37614591 PMCID: PMC10444026 DOI: 10.1177/20552076231194929] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 07/28/2023] [Indexed: 08/25/2023] Open
Abstract
Objective To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies. Methods We re-analyzed the publicly available data of two prominent case vignette-based symptom checker audit studies by calculating common metrics of test theory. Furthermore, we developed a new metric, the Capability Comparison Score (CCS), which compares symptom checker capability while controlling for the difficulty of the set of cases each symptom checker evaluated. We then scrutinized whether applying test theory and the CCS altered the performance ranking of the investigated symptom checkers. Results In both studies, most symptom checkers changed their rank order when adjusting the triage capability for item difficulty (ID) with the CCS. The previously reported triage accuracies commonly overestimated the capability of symptom checkers because they did not account for the fact that symptom checkers tend to selectively appraise easier cases (i.e., with high ID values). Also, many case vignettes in both studies showed insufficient (very low and even negative) values of item-total correlation (ITC), suggesting that individual items or the composition of item sets are of low quality. Conclusions A test-theoretic perspective helps identify previously undetected threats to the validity of case vignette-based symptom checker assessments and provides guidance and specific metrics to improve the quality of case vignettes, in particular by controlling for the difficulty of the vignettes an app was (not) able to evaluate correctly. Such measures might prove more meaningful than accuracy alone for the competitive assessment of symptom checkers. Our approach helps elaborate and standardize the methodology used for appraising symptom checker capability, which, ultimately, may yield more reliable results.
Collapse
Affiliation(s)
- Marvin Kopka
- Department of Psychology and Ergonomics (IPA), Division of Ergonomics, Technische Universität Berlin, Berlin, Germany
- Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Markus A Feufel
- Department of Psychology and Ergonomics (IPA), Division of Ergonomics, Technische Universität Berlin, Berlin, Germany
| | - Eta S Berner
- Department of Health Services Administration, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Malte L Schmieding
- Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
6
|
Lin S, Nateqi J, Weingartner-Ortner R, Gruarin S, Marling H, Pilgram V, Lagler FB, Aigner E, Martin AG. An artificial intelligence-based approach for identifying rare disease patients using retrospective electronic health records applied for Pompe disease. Front Neurol 2023; 14:1108222. [PMID: 37153672 PMCID: PMC10160659 DOI: 10.3389/fneur.2023.1108222] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 04/03/2023] [Indexed: 05/10/2023] Open
Abstract
Objective We retrospectively screened 350,116 electronic health records (EHRs) to identify suspected patients for Pompe disease. Using these suspected patients, we then describe their phenotypical characteristics and estimate the prevalence in the respective population covered by the EHRs. Methods We applied Symptoma's Artificial Intelligence-based approach for identifying rare disease patients to retrospective anonymized EHRs provided by the "University Hospital Salzburg" clinic group. Within 1 month, the AI screened 350,116 EHRs reaching back 15 years from five hospitals, and 104 patients were flagged as probable for Pompe disease. Flagged patients were manually reviewed and assessed by generalist and specialist physicians for their likelihood for Pompe disease, from which the performance of the algorithms was evaluated. Results Of the 104 patients flagged by the algorithms, generalist physicians found five "diagnosed," 10 "suspected," and seven patients with "reduced suspicion." After feedback from Pompe disease specialist physicians, 19 patients remained clinically plausible for Pompe disease, resulting in a specificity of 18.27% for the AI. Estimating from the remaining plausible patients, the prevalence of Pompe disease for the greater Salzburg region [incl. Bavaria (Germany), Styria (Austria), and Upper Austria (Austria)] was one in every 18,427 people. Phenotypes for patient cohorts with an approximated onset of symptoms above or below 1 year of age were established, which correspond to infantile-onset Pompe disease (IOPD) and late-onset Pompe disease (LOPD), respectively. Conclusion Our study shows the feasibility of Symptoma's AI-based approach for identifying rare disease patients using retrospective EHRs. Via the algorithm's screening of an entire EHR population, a physician had only to manually review 5.47 patients on average to find one suspected candidate. This efficiency is crucial as Pompe disease, while rare, is a progressively debilitating but treatable neuromuscular disease. As such, we demonstrated both the efficiency of the approach and the potential of a scalable solution to the systematic identification of rare disease patients. Thus, similar implementation of this methodology should be encouraged to improve care for all rare disease patients.
Collapse
Affiliation(s)
- Simon Lin
- Science Department, Symptoma GmbH, Vienna, Austria
- Department of Internal Medicine, Paracelsus Medical University, Salzburg, Austria
| | - Jama Nateqi
- Science Department, Symptoma GmbH, Vienna, Austria
- Department of Internal Medicine, Paracelsus Medical University, Salzburg, Austria
| | | | | | | | - Vinzenz Pilgram
- Medical and Information Technology - MIT, University Hospital Salzburg (SALK), Salzburg, Austria
| | - Florian B. Lagler
- Medical and Information Technology - MIT, University Hospital Salzburg (SALK), Salzburg, Austria
- Department of Pediatrics and Institute for Inherited Metabolic Diseases, Paracelsus Medical University, Salzburg, Austria
| | - Elmar Aigner
- Department of Internal Medicine, Paracelsus Medical University, Salzburg, Austria
- Medical and Information Technology - MIT, University Hospital Salzburg (SALK), Salzburg, Austria
| | - Alistair G. Martin
- Science Department, Symptoma GmbH, Vienna, Austria
- *Correspondence: Alistair G. Martin
| |
Collapse
|
7
|
Schmude M, Salim N, Azadzoy H, Bane M, Millen E, O'Donnell L, Bode P, Türk E, Vaidya R, Gilbert S. Investigating the Potential for Clinical Decision Support in Sub-Saharan Africa With AFYA (Artificial Intelligence-Based Assessment of Health Symptoms in Tanzania): Protocol for a Prospective, Observational Pilot Study. JMIR Res Protoc 2022; 11:e34298. [PMID: 35671073 PMCID: PMC9214611 DOI: 10.2196/34298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 02/17/2022] [Accepted: 04/30/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Low- and middle-income countries face difficulties in providing adequate health care. One of the reasons is a shortage of qualified health workers. Diagnostic decision support systems are designed to aid clinicians in their work and have the potential to mitigate pressure on health care systems. OBJECTIVE The Artificial Intelligence-Based Assessment of Health Symptoms in Tanzania (AFYA) study will evaluate the potential of an English-language artificial intelligence-based prototype diagnostic decision support system for mid-level health care practitioners in a low- or middle-income setting. METHODS This is an observational, prospective clinical study conducted in a busy Tanzanian district hospital. In addition to usual care visits, study participants will consult a mid-level health care practitioner, who will use a prototype diagnostic decision support system, and a study physician. The accuracy and comprehensiveness of the differential diagnosis provided by the diagnostic decision support system will be evaluated against a gold-standard differential diagnosis provided by an expert panel. RESULTS Patient recruitment started in October 2021. Participants were recruited directly in the waiting room of the outpatient clinic at the hospital. Data collection will conclude in May 2022. Data analysis is planned to be finished by the end of June 2022. The results will be published in a peer-reviewed journal. CONCLUSIONS Most diagnostic decision support systems have been developed and evaluated in high-income countries, but there is great potential for these systems to improve the delivery of health care in low- and middle-income countries. The findings of this real-patient study will provide insights based on the performance and usability of a prototype diagnostic decision support system in low- or middle-income countries. TRIAL REGISTRATION ClinicalTrials.gov NCT04958577; http://clinicaltrials.gov/ct2/show/NCT04958577. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/34298.
Collapse
Affiliation(s)
| | - Nahya Salim
- Muhimbili University of Health and Allied Sciences, Dar es Salaam, United Republic of Tanzania
| | | | - Mustafa Bane
- Muhimbili University of Health and Allied Sciences, Dar es Salaam, United Republic of Tanzania
| | | | | | | | | | | | - Stephen Gilbert
- Ada Health GmbH, Berlin, Germany.,Else Kröner Fresenius Center for Digital Health, University Hospital Carl Gustav Carus Dresden, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
8
|
Symptoms associated with a COVID-19 infection among a non-hospitalized cohort in Vienna. Wien Klin Wochenschr 2022; 134:344-350. [PMID: 35416543 PMCID: PMC9007045 DOI: 10.1007/s00508-022-02028-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/13/2022] [Indexed: 11/16/2022]
Abstract
Background Most clinical studies report the symptoms experienced by those infected with coronavirus disease 2019 (COVID-19) via patients already hospitalized. Here we analyzed the symptoms experienced outside of a hospital setting. Methods The Vienna Social Fund (FSW; Vienna, Austria), the Public Health Services of the City of Vienna (MA15) and the private company Symptoma collaborated to implement Vienna’s official online COVID-19 symptom checker. Users answered 12 yes/no questions about symptoms to assess their risk for COVID-19. They could also specify their age and sex, and whether they had contact with someone who tested positive for COVID-19. Depending on the assessed risk of COVID-19 positivity, a SARS-CoV‑2 nucleic acid amplification test (NAAT) was performed. In this publication, we analyzed which factors (symptoms, sex or age) are associated with COVID-19 positivity. We also trained a classifier to correctly predict COVID-19 positivity from the collected data. Results Between 2 November 2020 and 18 November 2021, 9133 people experiencing COVID-19-like symptoms were assessed as high risk by the chatbot and were subsequently tested by a NAAT. Symptoms significantly associated with a positive COVID-19 test were malaise, fatigue, headache, cough, fever, dysgeusia and hyposmia. Our classifier could successfully predict COVID-19 positivity with an area under the curve (AUC) of 0.74. Conclusion This study provides reliable COVID-19 symptom statistics based on the general population verified by NAATs. Supplementary Information The online version of this article (10.1007/s00508-022-02028-9) contains supplementary material, which is available to authorized users.
Collapse
|
9
|
Millen E, Salim N, Azadzoy H, Bane MM, O'Donnell L, Schmude M, Bode P, Tuerk E, Vaidya R, Gilbert SH. Study protocol for a pilot prospective, observational study investigating the condition suggestion and urgency advice accuracy of a symptom assessment app in sub-Saharan Africa: the AFYA-'Health' Study. BMJ Open 2022; 12:e055915. [PMID: 35410928 PMCID: PMC9003603 DOI: 10.1136/bmjopen-2021-055915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
INTRODUCTION Due to a global shortage of healthcare workers, there is a lack of basic healthcare for 4 billion people worldwide, particularly affecting low-income and middle-income countries. The utilisation of AI-based healthcare tools such as symptom assessment applications (SAAs) has the potential to reduce the burden on healthcare systems. The purpose of the AFYA Study (AI-based Assessment oF health sYmptoms in TAnzania) is to evaluate the accuracy of the condition suggestions and urgency advice provided by a user on a Swahili language Ada SAA. METHODS AND ANALYSIS This study is designed as an observational prospective clinical study. The setting is a waiting room of a Tanzanian district hospital. It will include patients entering the outpatient clinic with various conditions and age groups, including children and adolescents. Patients will be asked to use the SAA before proceeding to usual care. After usual care, they will have a consultation with a study-provided physician. Patients and healthcare practitioners will be blinded to the SAA's results. An expert panel will compare the Ada SAA's condition suggestions and urgency advice to usual care and study provided differential diagnoses and triage. The primary outcome measures are the accuracy and comprehensiveness of the Ada SAA evaluated against the gold standard differential diagnoses. ETHICS AND DISSEMINATION Ethical approval was received by the ethics committee (EC) of Muhimbili University of Health and Allied Sciences with an approval number MUHAS-REC-09-2019-044 and the National Institute for Medical Research, NIMR/HQ/R.8c/Vol. I/922. All amendments to the protocol are reported and adapted on the basis of the requirements of the EC. The results from this study will be submitted to peer-reviewed journals, local and international stakeholders, and will be communicated in editorials/articles by Ada Health. TRIAL REGISTRATION NUMBER NCT04958577.
Collapse
Affiliation(s)
| | - Nahya Salim
- Muhimbili University of Health and Allied Sciences, Dar es Salaam, United Republic of Tanzania
| | | | - Mustafa Miraji Bane
- Muhimbili University of Health and Allied Sciences, Dar es Salaam, United Republic of Tanzania
| | | | | | | | | | | | - Stephen Henry Gilbert
- Ada Health GmbH, Berlin, Germany
- EKFZ for Digital Health, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
10
|
Gilbert S, Fenech M, Upadhyay S, Wicks P, Novorol C. Quality of condition suggestions and urgency advice provided by the Ada symptom assessment app evaluated with vignettes optimised for Australia. Aust J Prim Health 2021; 27:377-381. [PMID: 34706813 DOI: 10.1071/py21032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 05/11/2021] [Indexed: 11/23/2022]
Abstract
When people face a health problem, they often first ask, 'Is there an app for that?'. We investigated the quality of advice provided by the Ada symptom assessment application to address the question, 'How do I know the app on my phone is safe and provides good advice?'. The app was tested with 48 independently created vignettes developed for a previous study, including 18 specifically developed for the Australian setting, using an independently developed methodology to evaluate the accuracy of condition suggestions and urgency advice. The correct condition was listed first in 65% of vignettes, and in the Top 3 results in 83% of vignettes. The urgency advice in the app exactly matched the gold standard 63% of vignettes. The app's accuracy of condition suggestion and urgency advice is higher than that of the best-performing symptom assessment app reported in a previous study (61%, 77% and 52% for conditions suggested in the Top 1, Top 3 and exactly matching urgency advice respectively). These results are relevant to the application of symptom assessment in primary and community health, where medical quality and safety should determine app choice.
Collapse
Affiliation(s)
- Stephen Gilbert
- Ada Health GmbH, Karl-Liebknecht-Straße 1, 10178 Berlin, Germany; and EKFZ for Digital Health, University Hospital Carl Gustav Carus Dresden, Technische Universität Dresden, Dresden, Germany; and Corresponding author.
| | - Matthew Fenech
- Ada Health GmbH, Karl-Liebknecht-Straße 1, 10178 Berlin, Germany
| | | | - Paul Wicks
- Ada Health GmbH, Karl-Liebknecht-Straße 1, 10178 Berlin, Germany
| | - Claire Novorol
- Ada Health GmbH, Karl-Liebknecht-Straße 1, 10178 Berlin, Germany
| |
Collapse
|
11
|
Munsch N, Martin A, Gruarin S, Nateqi J, Abdarahmane I, Weingartner-Ortner R, Knapp B. Authors' Reply to: Screening Tools: Their Intended Audiences and Purposes. Comment on "Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study". J Med Internet Res 2021; 23:e26543. [PMID: 33989162 PMCID: PMC8178729 DOI: 10.2196/26543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 05/13/2021] [Indexed: 11/24/2022] Open
Affiliation(s)
| | | | | | - Jama Nateqi
- Medical Department, Symptoma, Attersee, Austria.,Department of Internal Medicine, Paracelsus Medical University, Salzburg, Austria
| | | | | | - Bernhard Knapp
- Data Science Department, Symptoma, Vienna, Austria.,Department of Computer Science, University of Applied Sciences - Technikum Wien, Vienna, Austria
| |
Collapse
|
12
|
Plontke SK. Rare Diseases and Otorhinolaryngology, Head and Neck Surgery. Laryngorhinootologie 2021; 100:S1-S11. [PMID: 34352898 PMCID: PMC8354574 DOI: 10.1055/a-1397-0842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Rare diseases pose multiple challenges for patients, relatives, physicians,
nursing staff, and therapists. Their rarity impedes research and treatments
due to medical and economical reasons. Many diseases in the field
otorhinolaryngology, head and neck surgery are rare diseases due to their
low prevalence. The initiation of the right management processes requires
knowledge about diagnostics, resources like centers, networks and
registries, about specifics of the physician-patient relationship, follow-up
care including communication with family doctors and the role of self-help
groups. Of special interest for university hospitals and our scientific
society are the specific aspects of research including European networks and
research funding, information management, public relations, education,
training, financing, and regulations like orphan drugs and clinical trials
in small populations.
Collapse
|
13
|
Gilbert S, Mehl A, Baluch A, Cawley C, Challiner J, Fraser H, Millen E, Montazeri M, Multmeier J, Pick F, Richter C, Türk E, Upadhyay S, Virani V, Vona N, Wicks P, Novorol C. How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs. BMJ Open 2020; 10:e040269. [PMID: 33328258 PMCID: PMC7745523 DOI: 10.1136/bmjopen-2020-040269] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
OBJECTIVES To compare breadth of condition coverage, accuracy of suggested conditions and appropriateness of urgency advice of eight popular symptom assessment apps. DESIGN Vignettes study. SETTING 200 primary care vignettes. INTERVENTION/COMPARATOR For eight apps and seven general practitioners (GPs): breadth of coverage and condition-suggestion and urgency advice accuracy measured against the vignettes' gold-standard. PRIMARY OUTCOME MEASURES (1) Proportion of conditions 'covered' by an app, that is, not excluded because the user was too young/old or pregnant, or not modelled; (2) proportion of vignettes with the correct primary diagnosis among the top 3 conditions suggested; (3) proportion of 'safe' urgency advice (ie, at gold standard level, more conservative, or no more than one level less conservative). RESULTS Condition-suggestion coverage was highly variable, with some apps not offering a suggestion for many users: in alphabetical order, Ada: 99.0%; Babylon: 51.5%; Buoy: 88.5%; K Health: 74.5%; Mediktor: 80.5%; Symptomate: 61.5%; Your.MD: 64.5%; WebMD: 93.0%. Top-3 suggestion accuracy was GPs (average): 82.1%±5.2%; Ada: 70.5%; Babylon: 32.0%; Buoy: 43.0%; K Health: 36.0%; Mediktor: 36.0%; Symptomate: 27.5%; WebMD: 35.5%; Your.MD: 23.5%. Some apps excluded certain user demographics or conditions and their performance was generally greater with the exclusion of corresponding vignettes. For safe urgency advice, tested GPs had an average of 97.0%±2.5%. For the vignettes with advice provided, only three apps had safety performance within 1 SD of the GPs-Ada: 97.0%; Babylon: 95.1%; Symptomate: 97.8%. One app had a safety performance within 2 SDs of GPs-Your.MD: 92.6%. Three apps had a safety performance outside 2 SDs of GPs-Buoy: 80.0% (p<0.001); K Health: 81.3% (p<0.001); Mediktor: 87.3% (p=1.3×10-3). CONCLUSIONS The utility of digital symptom assessment apps relies on coverage, accuracy and safety. While no digital tool outperformed GPs, some came close, and the nature of iterative improvements to software offers scalable improvements to care.
Collapse
Affiliation(s)
| | | | | | | | | | - Hamish Fraser
- Brown Center for Biomedical Informatics, Brown University, Rhode Island, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Martin A, Nateqi J, Gruarin S, Munsch N, Abdarahmane I, Zobel M, Knapp B. An artificial intelligence-based first-line defence against COVID-19: digitally screening citizens for risks via a chatbot. Sci Rep 2020; 10:19012. [PMID: 33149198 PMCID: PMC7643065 DOI: 10.1038/s41598-020-75912-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 10/05/2020] [Indexed: 12/24/2022] Open
Abstract
To combat the pandemic of the coronavirus disease 2019 (COVID-19), numerous governments have established phone hotlines to prescreen potential cases. These hotlines have struggled with the volume of callers, leading to wait times of hours or, even, an inability to contact health authorities. Symptoma is a symptom-to-disease digital health assistant that can differentiate more than 20,000 diseases with an accuracy of more than 90%. We tested the accuracy of Symptoma to identify COVID-19 using a set of diverse clinical cases combined with case reports of COVID-19. We showed that Symptoma can accurately distinguish COVID-19 in 96.32% of clinical cases. When considering only COVID-19 symptoms and risk factors, Symptoma identified 100% of those infected when presented with only three signs. Lastly, we showed that Symptoma’s accuracy far exceeds that of simple “yes–no” questionnaires widely available online. In summary, Symptoma provides unparalleled accuracy in systematically identifying cases of COVID-19 while also considering over 20,000 other diseases. Furthermore, Symptoma allows free text input, furthered with disease-specific follow up questions, in 36 languages. Combined, these results and accessibility give Symptoma the potential to be a key tool in the global fight against COVID-19. The Symptoma predictor is freely available online at https://www.symptoma.com.
Collapse
Affiliation(s)
| | - Jama Nateqi
- Medical Department, Symptoma, Attersee, Austria. .,Department of Internal Medicine, Paracelsus Medical University, Salzburg, Austria.
| | | | | | | | - Marc Zobel
- Data Science Department, Symptoma, Vienna, Austria
| | | |
Collapse
|
15
|
Mehl A, Bergey F, Cawley C, Gilsdorf A. Syndromic Surveillance Insights from a Symptom Assessment App Before and During COVID-19 Measures in Germany and the United Kingdom: Results From Repeated Cross-Sectional Analyses. JMIR Mhealth Uhealth 2020; 8:e21364. [PMID: 32997640 PMCID: PMC7561445 DOI: 10.2196/21364] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 07/15/2020] [Accepted: 09/15/2020] [Indexed: 12/23/2022] Open
Abstract
Background Unprecedented lockdown measures have been introduced in countries worldwide to mitigate the spread and consequences of COVID-19. Although attention has been focused on the effects of these measures on epidemiological indicators relating directly to the infection, there is increased recognition of their broader health implications. However, assessing these implications in real time is a challenge, due to the limitations of existing syndromic surveillance data and tools. Objective The aim of this study is to explore the added value of mobile phone app–based symptom assessment tools as real-time health insight providers to inform public health policy makers. Methods A comparative and descriptive analysis of the proportion of all self-reported symptoms entered by users during an assessment within the Ada app in Germany and the United Kingdom was conducted between two periods, namely before and after the implementation of “Phase One” COVID-19 measures. Additional analyses were performed to explore the association between symptom trends and seasonality, and symptom trends and weather. Differences in the proportion of unique symptoms between the periods were analyzed using a Pearson chi-square test and reported as log2 fold changes. Results Overall, 48,300-54,900 symptomatic users reported 140,500-170,400 symptoms during the Baseline and Measures periods in Germany. Overall, 34,200-37,400 symptomatic users in the United Kingdom reported 112,100-131,900 symptoms during the Baseline and Measures periods. The majority of symptomatic users were female (Germany: 68,600/103,200, 66.52%; United Kingdom: 51,200/71,600, 72.74%). The majority were aged 10-29 years (Germany: 68,500/100,000, 68.45%; United Kingdom: 50,900/68,800, 73.91%), and about one-quarter were aged 30-59 years (Germany: 26,200/100,000, 26.15%; United Kingdom: 14,900/68,800, 21.65%). Overall, 103 symptoms were reported either more or less frequently (with statistically significant differences) during the Measures period as compared to the Baseline period, and 34 of these were reported in both countries. The following mental health symptoms (log2 fold change, P value) were reported less often during the Measures period: inability to manage constant stress and demands at work (–1.07, P<.001), memory difficulty (–0.56, P<.001), depressed mood (–0.42, P<.001), and impaired concentration (–0.46, P<.001). Diminished sense of taste (2.26, P<.001) and hyposmia (2.20, P<.001) were reported more frequently during the Measures period. None of the 34 symptoms were found to be different between the same dates in 2019. In total, 14 of the 34 symptoms had statistically significant associations with weather variables. Conclusions Symptom assessment apps have an important role to play in facilitating improved understanding of the implications of public health policies such as COVID-19 lockdown measures. Not only do they provide the means to complement and cross-validate hypotheses based on data collected through more traditional channels, they can also generate novel insights through a real-time syndromic surveillance system.
Collapse
Affiliation(s)
- Alicia Mehl
- Department of Epidemiology & Public Health, Ada Health GmbH, Berlin, Germany
| | - Francois Bergey
- Department of Epidemiology & Public Health, Ada Health GmbH, Berlin, Germany
| | - Caoimhe Cawley
- Department of Epidemiology & Public Health, Ada Health GmbH, Berlin, Germany
| | - Andreas Gilsdorf
- Department of Epidemiology & Public Health, Ada Health GmbH, Berlin, Germany
| |
Collapse
|
16
|
Munsch N, Martin A, Gruarin S, Nateqi J, Abdarahmane I, Weingartner-Ortner R, Knapp B. Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study. J Med Internet Res 2020; 22:e21299. [PMID: 33001828 PMCID: PMC7541039 DOI: 10.2196/21299] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/27/2020] [Accepted: 09/14/2020] [Indexed: 01/06/2023] Open
Abstract
Background A large number of web-based COVID-19 symptom checkers and chatbots have been developed; however, anecdotal evidence suggests that their conclusions are highly variable. To our knowledge, no study has evaluated the accuracy of COVID-19 symptom checkers in a statistically rigorous manner. Objective The aim of this study is to evaluate and compare the diagnostic accuracies of web-based COVID-19 symptom checkers. Methods We identified 10 web-based COVID-19 symptom checkers, all of which were included in the study. We evaluated the COVID-19 symptom checkers by assessing 50 COVID-19 case reports alongside 410 non–COVID-19 control cases. A bootstrapping method was used to counter the unbalanced sample sizes and obtain confidence intervals (CIs). Results are reported as sensitivity, specificity, F1 score, and Matthews correlation coefficient (MCC). Results The classification task between COVID-19–positive and COVID-19–negative for “high risk” cases among the 460 test cases yielded (sorted by F1 score): Symptoma (F1=0.92, MCC=0.85), Infermedica (F1=0.80, MCC=0.61), US Centers for Disease Control and Prevention (CDC) (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Cleveland Clinic (F1=0.40, MCC=0.07), Providence (F1=0.40, MCC=0.05), Apple (F1=0.29, MCC=-0.10), Docyet (F1=0.27, MCC=0.29), Ada (F1=0.24, MCC=0.27) and Your.MD (F1=0.24, MCC=0.27). For “high risk” and “medium risk” combined the performance was: Symptoma (F1=0.91, MCC=0.83) Infermedica (F1=0.80, MCC=0.61), Cleveland Clinic (F1=0.76, MCC=0.47), Providence (F1=0.75, MCC=0.45), Your.MD (F1=0.72, MCC=0.33), CDC (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Apple (F1=0.70, MCC=0.25), Ada (F1=0.42, MCC=0.03), and Docyet (F1=0.27, MCC=0.29). Conclusions We found that the number of correctly assessed COVID-19 and control cases varies considerably between symptom checkers, with different symptom checkers showing different strengths with respect to sensitivity and specificity. A good balance between sensitivity and specificity was only achieved by two symptom checkers.
Collapse
Affiliation(s)
| | | | | | - Jama Nateqi
- Medical Department, Symptoma, Attersee, Austria.,Department of Internal Medicine, Paracelsus Medical University, Salzburg, Austria
| | | | | | | |
Collapse
|
17
|
Rebitschek FG, Gigerenzer G. [Assessing the quality of digital health services: How can informed decisions be promoted?]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2020; 63:665-673. [PMID: 32424555 DOI: 10.1007/s00103-020-03146-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
An important prerequisite for the success of the digitisation of the healthcare system are risk-literate users. Risk literacy means the ability to weigh potential benefits and harms of digital technologies and information, to use digital services critically, and to understand statistical evidence. How do people find reliable and comprehensible health information on the Internet? How can they better assess the quality of algorithmic decision systems? This narrative contribution describes two approaches that show how the competence to make informed decisions can be promoted.Evidence-based and reliable health information exists on the Internet but must be distinguished from a large amount of unreliable information. Various institutions in the German-speaking world have therefore provided guidance to help laypersons make informed decisions. The Harding Center for Risk Literacy in Potsdam, for example, has developed a decision tree ("fast-and-frugal tree"). When dealing with algorithms, natural frequency trees (NFTs) can help to assess the quality and fairness of an algorithmic decision system.Independent of reliable and comprehensible digital health services, further tools for laypersons to assess information and algorithms should be developed and provided. These tools can also be included in institutional training programmes for the promotion of digital literacy. This would be an important step towards the success of digitisation in prevention and health promotion.
Collapse
Affiliation(s)
- Felix G Rebitschek
- Max-Planck-Institut für Bildungsforschung, Lentzeallee 94, 14195, Berlin, Deutschland. .,Fakultät für Gesundheitswissenschaften Brandenburg, Harding-Zentrum für Risikokompetenz, Universität Potsdam, Virchowstr. 2, 14482, Potsdam, Deutschland.
| | - Gerd Gigerenzer
- Max-Planck-Institut für Bildungsforschung, Lentzeallee 94, 14195, Berlin, Deutschland.,Fakultät für Gesundheitswissenschaften Brandenburg, Harding-Zentrum für Risikokompetenz, Universität Potsdam, Virchowstr. 2, 14482, Potsdam, Deutschland
| |
Collapse
|