1
|
Aissaoui Ferhi L, Ben Amar M, Choubani F, Bouallegue R. Enhancing diagnostic accuracy in symptom-based health checkers: a comprehensive machine learning approach with clinical vignettes and benchmarking. Front Artif Intell 2024; 7:1397388. [PMID: 39421435 PMCID: PMC11483353 DOI: 10.3389/frai.2024.1397388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 07/17/2024] [Indexed: 10/19/2024] Open
Abstract
Introduction The development of machine learning models for symptom-based health checkers is a rapidly evolving area with significant implications for healthcare. Accurate and efficient diagnostic tools can enhance patient outcomes and optimize healthcare resources. This study focuses on evaluating and optimizing machine learning models using a dataset of 10 diseases and 9,572 samples. Methods The dataset was divided into training and testing sets to facilitate model training and evaluation. The following models were selected and optimized: Decision Tree, Random Forest, Naive Bayes, Logistic Regression and K-Nearest Neighbors. Evaluation metrics included accuracy, F1 scores, and 10-fold cross-validation. ROC-AUC and precision-recall curves were also utilized to assess model performance, particularly in scenarios with imbalanced datasets. Clinical vignettes were employed to gauge the real-world applicability of the models. Results The performance of the models was evaluated using accuracy, F1 scores, and 10-fold cross-validation. The use of ROC-AUC curves revealed that model performance improved with increasing complexity. Precision-recall curves were particularly useful in evaluating model sensitivity in imbalanced dataset scenarios. Clinical vignettes demonstrated the robustness of the models in providing accurate diagnoses. Discussion The study underscores the importance of comprehensive model evaluation techniques. The use of clinical vignette testing and analysis of ROC-AUC and precision-recall curves are crucial in ensuring the reliability and sensitivity of symptom-based health checkers. These techniques provide a more nuanced understanding of model performance and highlight areas for further improvement. Conclusion This study highlights the significance of employing diverse evaluation metrics and methods to ensure the robustness and accuracy of machine learning models in symptom-based health checkers. The integration of clinical vignettes and the analysis of ROC-AUC and precision-recall curves are essential steps in developing reliable and sensitive diagnostic tools.
Collapse
Affiliation(s)
- Leila Aissaoui Ferhi
- Virtual University of Tunis, Tunis, Tunisia
- Innov’Com Laboratory at SUPCOM, University of Carthage, Carthage, Tunisia
| | - Manel Ben Amar
- Virtual University of Tunis, Tunis, Tunisia
- Innov’Com Laboratory at SUPCOM, University of Carthage, Carthage, Tunisia
- Faculty of Dental Medicine of Monastir, University of Monastir, Monastir, Tunisia
| | - Fethi Choubani
- Innov’Com Laboratory at SUPCOM, University of Carthage, Carthage, Tunisia
| | - Ridha Bouallegue
- Innov’Com Laboratory at SUPCOM, University of Carthage, Carthage, Tunisia
| |
Collapse
|
2
|
Liu V, Kaila M, Koskela T. Triage Accuracy and the Safety of User-Initiated Symptom Assessment With an Electronic Symptom Checker in a Real-Life Setting: Instrument Validation Study. JMIR Hum Factors 2024; 11:e55099. [PMID: 39326038 PMCID: PMC11467609 DOI: 10.2196/55099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 05/13/2024] [Accepted: 07/16/2024] [Indexed: 09/28/2024] Open
Abstract
BACKGROUND Previous studies have evaluated the accuracy of the diagnostics of electronic symptom checkers (ESCs) and triage using clinical case vignettes. National Omaolo digital services (Omaolo) in Finland consist of an ESC for various symptoms. Omaolo is a medical device with a Conformité Européenne marking (risk class: IIa), based on Duodecim Clinical Decision Support, EBMEDS. OBJECTIVE This study investigates how well triage performed by the ESC nurse triage within the chief symptom list available in Omaolo (anal region symptoms, cough, diarrhea, discharge from the eye or watery or reddish eye, headache, heartburn, knee symptom or injury, lower back pain or injury, oral health, painful or blocked ear, respiratory tract infection, sexually transmitted disease, shoulder pain or stiffness or injury, sore throat or throat symptom, and urinary tract infection). In addition, the accuracy, specificity, sensitivity, and safety of the Omaolo ESC were assessed. METHODS This is a clinical validation study in a real-life setting performed at multiple primary health care (PHC) centers across Finland. The included units were of the walk-in model of primary care, where no previous phone call or contact was required. Upon arriving at the PHC center, users (patients) answered the ESC questions and received a triage recommendation; a nurse then assessed their triage. Findings on 877 patients were analyzed by matching the ESC recommendations with triage by the triage nurse. RESULTS Safe assessments by the ESC accounted for 97.6% (856/877; 95% CI 95.6%-98.0%) of all assessments made. The mean of the exact match for all symptom assessments was 53.7% (471/877; 95% CI 49.2%-55.9%). The mean value of the exact match or overly conservative but suitable for all (ESC's assessment was 1 triage level higher than the nurse's triage) symptom assessments was 66.6% (584/877; 95% CI 63.4%-69.7%). When the nurse concluded that urgent treatment was needed, the ESC's exactly matched accuracy was 70.9% (244/344; 95% CI 65.8%-75.7%). Sensitivity for the Omaolo ESC was 62.6% and specificity of 69.2%. A total of 21 critical assessments were identified for further analysis: there was no indication of compromised patient safety. CONCLUSIONS The primary objectives of this study were to evaluate the safety and to explore the accuracy, specificity, and sensitivity of the Omaolo ESC. The results indicate that the ESC is safe in a real-life setting when appraised with assessments conducted by triage nurses. Furthermore, the Omaolo ESC exhibits the potential to guide patients to appropriate triage destinations effectively, helping them to receive timely and suitable care. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.2196/41423.
Collapse
Affiliation(s)
- Ville Liu
- Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Minna Kaila
- Public Health Medicine, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Tuomas Koskela
- Department of General Practice, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- The Wellbeing Services County of Pirkanmaa, Tampere, Finland
| |
Collapse
|
3
|
Jindal A, Brandao-de-Resende C, Neo YN, Melo M, Day AC. Enhancing Ophthalmic Triage: identification of new clinical features to support healthcare professionals in triage. Eye (Lond) 2024; 38:2536-2544. [PMID: 38627545 PMCID: PMC11385555 DOI: 10.1038/s41433-024-03070-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 03/07/2024] [Accepted: 04/05/2024] [Indexed: 09/11/2024] Open
Abstract
OBJECTIVE To investigate which features from a patient's history are either high or low risk that could support healthcare professionals in ophthalmic emergency triage. METHODS Prospective, 12,584 visits from 11,733 adult patients attending an Accident and Emergency department at a single tertiary centre were analysed. Data were collected by ophthalmic nurses working in triage, using an online form from August 2021 to April 2022. Multivariate analysis (MVA) was conducted to identify which features from the patients' history would be associated with emergency care. RESULTS This study found that 45.5% (5731 patient visits (PV)) required a same day eye emergency examination (SDEE), 11.3% (1416 PV) needed urgent care, and 43.2% (5437 PV) were appropriate for elective consultations with a GP or optometrist. The MVA top ten features that were statistically significant (p < 0.05) that would warrant SDEE with odds ratio (95% CI) were: bilateral eye injury 36.5 [15.6-85.5], unilateral eye injury 25.8 [20.9-31.7], vision loss 4.8 [2.9-7.8], post-operative ophthalmic ( < 4 weeks) 4.6 [3.8-5.7], contact lens wearer 3.9 [3.3-4.7], history of uveitis 3.9 [3.3-4.7], photophobia 2.9 [2.4-3.6], unilateral dark shadow/curtain in vision 2.4 [1.8-3.0], unilateral injected red eye 2.0 [1.8-2.2] and rapid change in visual acuity 1.8 [1.5-2.2]. CONCLUSION This study characterises presenting features covering almost 100 ophthalmic acute presentations that are commonly seen in emergency and elective care. This information could supplement current red flag indicators and support healthcare professionals in ophthalmic triage. Further research is required to evaluate the cost effectivity and safety of our findings for triaging acute presentations.
Collapse
Affiliation(s)
- Anish Jindal
- Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom.
- Department of Brain Sciences, Institute of Ophthalmology, University College London, London, UK.
| | - Camilo Brandao-de-Resende
- Department of Brain Sciences, Institute of Ophthalmology, University College London, London, UK
- NIHR Moorfields Clinical Research Facility, Moorfields Eye Hospital, London, United Kingdom
| | - Yan Ning Neo
- Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom
| | - Mariane Melo
- NIHR Moorfields Clinical Research Facility, Moorfields Eye Hospital, London, United Kingdom
| | - Alexander C Day
- Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom
- Department of Brain Sciences, Institute of Ophthalmology, University College London, London, UK
- NIHR Moorfields Clinical Research Facility, Moorfields Eye Hospital, London, United Kingdom
| |
Collapse
|
4
|
Meczner A, Cohen N, Qureshi A, Reza M, Sutaria S, Blount E, Bagyura Z, Malak T. Controlling Inputter Variability in Vignette Studies Assessing Web-Based Symptom Checkers: Evaluation of Current Practice and Recommendations for Isolated Accuracy Metrics. JMIR Form Res 2024; 8:e49907. [PMID: 38820578 PMCID: PMC11179013 DOI: 10.2196/49907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/10/2023] [Accepted: 04/24/2024] [Indexed: 06/02/2024] Open
Abstract
BACKGROUND The rapid growth of web-based symptom checkers (SCs) is not matched by advances in quality assurance. Currently, there are no widely accepted criteria assessing SCs' performance. Vignette studies are widely used to evaluate SCs, measuring the accuracy of outcome. Accuracy behaves as a composite metric as it is affected by a number of individual SC- and tester-dependent factors. In contrast to clinical studies, vignette studies have a small number of testers. Hence, measuring accuracy alone in vignette studies may not provide a reliable assessment of performance due to tester variability. OBJECTIVE This study aims to investigate the impact of tester variability on the accuracy of outcome of SCs, using clinical vignettes. It further aims to investigate the feasibility of measuring isolated aspects of performance. METHODS Healthily's SC was assessed using 114 vignettes by 3 groups of 3 testers who processed vignettes with different instructions: free interpretation of vignettes (free testers), specified chief complaints (partially free testers), and specified chief complaints with strict instruction for answering additional symptoms (restricted testers). κ statistics were calculated to assess agreement of top outcome condition and recommended triage. Crude and adjusted accuracy was measured against a gold standard. Adjusted accuracy was calculated using only results of consultations identical to the vignette, following a review and selection process. A feasibility study for assessing symptom comprehension of SCs was performed using different variations of 51 chief complaints across 3 SCs. RESULTS Intertester agreement of most likely condition and triage was, respectively, 0.49 and 0.51 for the free tester group, 0.66 and 0.66 for the partially free group, and 0.72 and 0.71 for the restricted group. For the restricted group, accuracy ranged from 43.9% to 57% for individual testers, averaging 50.6% (SD 5.35%). Adjusted accuracy was 56.1%. Assessing symptom comprehension was feasible for all 3 SCs. Comprehension scores ranged from 52.9% and 68%. CONCLUSIONS We demonstrated that by improving standardization of the vignette testing process, there is a significant improvement in the agreement of outcome between testers. However, significant variability remained due to uncontrollable tester-dependent factors, reflected by varying outcome accuracy. Tester-dependent factors, combined with a small number of testers, limit the reliability and generalizability of outcome accuracy when used as a composite measure in vignette studies. Measuring and reporting different aspects of SC performance in isolation provides a more reliable assessment of SC performance. We developed an adjusted accuracy measure using a review and selection process to assess data algorithm quality. In addition, we demonstrated that symptom comprehension with different input methods can be feasibly compared. Future studies reporting accuracy need to apply vignette testing standardization and isolated metrics.
Collapse
Affiliation(s)
- András Meczner
- Healthily, London, United Kingdom
- Institute for Clinical Data Management, Semmelweis University, Budapest, Hungary
| | | | | | | | | | | | - Zsolt Bagyura
- Institute for Clinical Data Management, Semmelweis University, Budapest, Hungary
| | | |
Collapse
|
5
|
Hammoud M, Douglas S, Darmach M, Alawneh S, Sanyal S, Kanbour Y. Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study. JMIR AI 2024; 3:e46875. [PMID: 38875676 PMCID: PMC11091811 DOI: 10.2196/46875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/15/2023] [Accepted: 03/02/2024] [Indexed: 06/16/2024]
Abstract
BACKGROUND Medical self-diagnostic tools (or symptom checkers) are becoming an integral part of digital health and our daily lives, whereby patients are increasingly using them to identify the underlying causes of their symptoms. As such, it is essential to rigorously investigate and comprehensively report the diagnostic performance of symptom checkers using standard clinical and scientific approaches. OBJECTIVE This study aims to evaluate and report the accuracies of a few known and new symptom checkers using a standard and transparent methodology, which allows the scientific community to cross-validate and reproduce the reported results, a step much needed in health informatics. METHODS We propose a 4-stage experimentation methodology that capitalizes on the standard clinical vignette approach to evaluate 6 symptom checkers. To this end, we developed and peer-reviewed 400 vignettes, each approved by at least 5 out of 7 independent and experienced primary care physicians. To establish a frame of reference and interpret the results of symptom checkers accordingly, we further compared the best-performing symptom checker against 3 primary care physicians with an average experience of 16.6 (SD 9.42) years. To measure accuracy, we used 7 standard metrics, including M1 as a measure of a symptom checker's or a physician's ability to return a vignette's main diagnosis at the top of their differential list, F1-score as a trade-off measure between recall and precision, and Normalized Discounted Cumulative Gain (NDCG) as a measure of a differential list's ranking quality, among others. RESULTS The diagnostic accuracies of the 6 tested symptom checkers vary significantly. For instance, the differences in the M1, F1-score, and NDCG results between the best-performing and worst-performing symptom checkers or ranges were 65.3%, 39.2%, and 74.2%, respectively. The same was observed among the participating human physicians, whereby the M1, F1-score, and NDCG ranges were 22.8%, 15.3%, and 21.3%, respectively. When compared against each other, physicians outperformed the best-performing symptom checker by an average of 1.2% using F1-score, whereas the best-performing symptom checker outperformed physicians by averages of 10.2% and 25.1% using M1 and NDCG, respectively. CONCLUSIONS The performance variation between symptom checkers is substantial, suggesting that symptom checkers cannot be treated as a single entity. On a different note, the best-performing symptom checker was an artificial intelligence (AI)-based one, shedding light on the promise of AI in improving the diagnostic capabilities of symptom checkers, especially as AI keeps advancing exponentially.
Collapse
|
6
|
Peven K, Wickham AP, Wilks O, Kaplan YC, Marhol A, Ahmed S, Bamford R, Cunningham AC, Prentice C, Meczner A, Fenech M, Gilbert S, Klepchukova A, Ponzo S, Zhaunova L. Assessment of a Digital Symptom Checker Tool's Accuracy in Suggesting Reproductive Health Conditions: Clinical Vignettes Study. JMIR Mhealth Uhealth 2023; 11:e46718. [PMID: 38051574 PMCID: PMC10731551 DOI: 10.2196/46718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 09/06/2023] [Accepted: 11/07/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Reproductive health conditions such as endometriosis, uterine fibroids, and polycystic ovary syndrome (PCOS) affect a large proportion of women and people who menstruate worldwide. Prevalence estimates for these conditions range from 5% to 40% of women of reproductive age. Long diagnostic delays, up to 12 years, are common and contribute to health complications and increased health care costs. Symptom checker apps provide users with information and tools to better understand their symptoms and thus have the potential to reduce the time to diagnosis for reproductive health conditions. OBJECTIVE This study aimed to evaluate the agreement between clinicians and 3 symptom checkers (developed by Flo Health UK Limited) in assessing symptoms of endometriosis, uterine fibroids, and PCOS using vignettes. We also aimed to present a robust example of vignette case creation, review, and classification in the context of predeployment testing and validation of digital health symptom checker tools. METHODS Independent general practitioners were recruited to create clinical case vignettes of simulated users for the purpose of testing each condition symptom checker; vignettes created for each condition contained a mixture of condition-positive and condition-negative outcomes. A second panel of general practitioners then reviewed, approved, and modified (if necessary) each vignette. A third group of general practitioners reviewed each vignette case and designated a final classification. Vignettes were then entered into the symptom checkers by a fourth, different group of general practitioners. The outcomes of each symptom checker were then compared with the final classification of each vignette to produce accuracy metrics including percent agreement, sensitivity, specificity, positive predictive value, and negative predictive value. RESULTS A total of 24 cases were created per condition. Overall, exact matches between the vignette general practitioner classification and the symptom checker outcome were 83% (n=20) for endometriosis, 83% (n=20) for uterine fibroids, and 88% (n=21) for PCOS. For each symptom checker, sensitivity was reported as 81.8% for endometriosis, 84.6% for uterine fibroids, and 100% for PCOS; specificity was reported as 84.6% for endometriosis, 81.8% for uterine fibroids, and 75% for PCOS; positive predictive value was reported as 81.8% for endometriosis, 84.6% for uterine fibroids, 80% for PCOS; and negative predictive value was reported as 84.6% for endometriosis, 81.8% for uterine fibroids, and 100% for PCOS. CONCLUSIONS The single-condition symptom checkers have high levels of agreement with general practitioner classification for endometriosis, uterine fibroids, and PCOS. Given long delays in diagnosis for many reproductive health conditions, which lead to increased medical costs and potential health complications for individuals and health care providers, innovative health apps and symptom checkers hold the potential to improve care pathways.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Stephen Gilbert
- Else Kröner Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
| | | | - Sonia Ponzo
- Flo Health UK Limited, London, United Kingdom
| | | |
Collapse
|
7
|
Chen J, Wu X, Li M, Liu L, Zhong L, Xiao J, Lou B, Zhong X, Chen Y, Huang W, Meng X, Gui Y, Chen M, Wang D, Dongye M, Zhang X, Cheung CY, Lai IF, Yan H, Lin X, Zheng Y, Lin H. EE-Explorer: A Multimodal Artificial Intelligence System for Eye Emergency Triage and Primary Diagnosis. Am J Ophthalmol 2023; 252:253-264. [PMID: 37142171 DOI: 10.1016/j.ajo.2023.04.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/09/2023] [Accepted: 04/10/2023] [Indexed: 05/06/2023]
Abstract
PURPOSE To develop a multimodal artificial intelligence (AI) system, EE-Explorer, to triage eye emergencies and assist in primary diagnosis using metadata and ocular images. DESIGN A diagnostic, cross-sectional, validity and reliability study. METHODS EE-Explorer consists of 2 models. The triage model was developed from metadata (events, symptoms, and medical history) and ocular surface images via smartphones from 2038 patients presenting to Zhongshan Ophthalmic Center (ZOC) to output 3 classifications: urgent, semiurgent, and nonurgent. The primary diagnostic model was developed from the paired metadata and slitlamp images of 2405 patients from ZOC. Both models were externally tested on 103 participants from 4 other hospitals. A pilot test was conducted in Guangzhou to evaluate the hierarchical referral service pattern assisted by EE-Explorer for unspecialized health care facilities. RESULTS A high overall accuracy, as indicated by an area under the receiver operating characteristic curve (AUC) of 0.982 (95% CI, 0.966-0.998), was obtained using the triage model, which outperformed the triage nurses (P < .001). In the primary diagnostic model, the diagnostic classification accuracy (CA) and Hamming loss (HL) in the internal testing were 0.808 (95% CI 0.776-0.840) and 0.016 (95% CI 0.006-0.026), respectively. In the external testing, model performance was robust for both triage (average AUC, 0.988, 95% CI 0.967-1.000) and primary diagnosis (CA, 0.718, 95% CI 0.644-0.792; and HL, 0.023, 95% CI 0.000-0.048). In the pilot test in the hierarchical referral settings, EE-explorer demonstrated consistently robust performance and broad participant acceptance. CONCLUSION The EE-Explorer system showed robust performance in both triage and primary diagnosis for ophthalmic emergency patients. EE-Explorer can provide patients with acute ophthalmic symptoms access to remote self-triage and assist in primary diagnosis in unspecialized health care facilities to achieve rapid and effective treatment strategies.
Collapse
Affiliation(s)
- Juan Chen
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Xiaohang Wu
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Mingyuan Li
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Lixue Liu
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Liuxueying Zhong
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Jun Xiao
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Bingsheng Lou
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Xingwu Zhong
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong; Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University (X.Zho., Y.C., W.H., H.L.), Haikou, Hainan
| | - Yanting Chen
- Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University (X.Zho., Y.C., W.H., H.L.), Haikou, Hainan
| | - Wenbin Huang
- Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University (X.Zho., Y.C., W.H., H.L.), Haikou, Hainan
| | - Xiangda Meng
- Tianjin Medical University General Hospital (X.M., H.Y.), Tianjin
| | - Yufei Gui
- First Affiliated Hospital of Kunming Medical University, Kunming (Y.G.), Yunnan
| | - Meizhen Chen
- Guangzhou Aier Eye Hospital (M.C.), Guangzhou, Guangdong
| | - Dongni Wang
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Meimei Dongye
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Xulin Zhang
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Carol Y Cheung
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong (C.Y.C.), Hong Kong
| | - Iat Fan Lai
- Ophthalmic Center, Kiang Wu Hospital (I.F.L.), Macao SAR, Macao
| | - Hua Yan
- Tianjin Medical University General Hospital (X.M., H.Y.), Tianjin
| | - Xiaofeng Lin
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Yongxin Zheng
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong
| | - Haotian Lin
- From the State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases (J.C., X.W., M.L., L.L., L.Z., J.X., B.L., X.Zho., D.W., M.D., X.Zha., X.L., Y.Z., H.L.), Guangzhou, Guangdong; Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University (X.Zho., Y.C., W.H., H.L.), Haikou, Hainan; Center for Precision Medicine and Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University (H.L.), Guangzhou, Guangdong, China.
| |
Collapse
|
8
|
Polevikov S. Advancing AI in healthcare: A comprehensive review of best practices. Clin Chim Acta 2023; 548:117519. [PMID: 37595864 DOI: 10.1016/j.cca.2023.117519] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 08/14/2023] [Accepted: 08/15/2023] [Indexed: 08/20/2023]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are powerful tools shaping the healthcare sector. This review considers twelve key aspects of AI in clinical practice: 1) Ethical AI; 2) Explainable AI; 3) Health Equity and Bias in AI; 4) Sponsorship Bias; 5) Data Privacy; 6) Genomics and Privacy; 7) Insufficient Sample Size and Self-Serving Bias; 8) Bridging the Gap Between Training Datasets and Real-World Scenarios; 9) Open Source and Collaborative Development; 10) Dataset Bias and Synthetic Data; 11) Measurement Bias; 12) Reproducibility in AI Research. These categories represent both the challenges and opportunities of AI implementation in healthcare. While AI holds significant potential for improving patient care, it also presents risks and challenges, such as ensuring privacy, combating bias, and maintaining transparency and ethics. The review underscores the necessity of developing comprehensive best practices for healthcare organizations and fostering a diverse dialogue involving data scientists, clinicians, patient advocates, ethicists, economists, and policymakers. We are at the precipice of significant transformation in healthcare powered by AI. By continuing to reassess and refine our approach, we can ensure that AI is implemented responsibly and ethically, maximizing its benefit to patient care and public health.
Collapse
|
9
|
Meer E, Ramakrishnan MS, Whitehead G, Leri D, Rosin R, VanderBeek B. Validation of an Automated Symptom-Based Triage Tool in Ophthalmology. Appl Clin Inform 2023; 14:448-454. [PMID: 36990454 PMCID: PMC10247304 DOI: 10.1055/a-2065-4613] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
OBJECTIVES Acute care ophthalmic clinics often suffer from inefficient triage, leading to suboptimal patient access and resource utilization. This study reports the preliminary results of a novel, symptom-based, patient-directed, online triage tool developed to address the most common acute ophthalmic diagnoses and associated presenting symptoms. METHODS A retrospective chart review of patients who presented to a tertiary academic medical center's urgent eye clinic after being referred for an urgent, semi-urgent, or nonurgent visit by the ophthalmic triage tool between January 1, 2021 and January 1, 2022 was performed. Concordance between triage category and severity of diagnosis on the subsequent clinic visit was assessed. RESULTS The online triage tool was utilized 1,370 and 95 times, by the call center administrators (phone triage group) and patients directly (web triage group), respectively. Of all patients triaged with the tool, 8.50% were deemed urgent, 59.2% semi-urgent, and 32.3% nonurgent. At the subsequent clinic visit, the history of present illness had significant agreement with symptoms reported to the triage tool (99.3% agreement, weighted kappa = 0.980, p < 0.001). The triage algorithm also had significant agreement with the severity of the physician diagnosis (97.0% agreement, weighted kappa = 0.912, p < 0.001). Zero patients were found to have a diagnosis on exam that should have corresponded to a higher urgency level on the triage tool. CONCLUSION The automated ophthalmic triage algorithm was able to safely and effectively triage patients based on symptoms. Future work should focus on the utility of this tool to reduce nonurgent patient load in urgent clinical settings and to improve access for patients who require urgent medical care.
Collapse
Affiliation(s)
- Elana Meer
- Department of Ophthalmology, Scheie Eye Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
- Department of Ophthalmology, University of California San Francisco, San Francisco, California, United States
| | - Meera S. Ramakrishnan
- Department of Ophthalmology, Scheie Eye Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Gideon Whitehead
- Department of Ophthalmology, Scheie Eye Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| | - Damien Leri
- Center for Health Incentives and Behavioral Economics, University of Pennsylvania, Philadelphia, Pennsylvania, United States
- Penn Medicine Center for Health Care Innovation, University of Pennsylvania Health System, Philadelphia, Pennsylvania, United States
| | - Roy Rosin
- Center for Health Incentives and Behavioral Economics, University of Pennsylvania, Philadelphia, Pennsylvania, United States
- Penn Medicine Center for Health Care Innovation, University of Pennsylvania Health System, Philadelphia, Pennsylvania, United States
| | - Brian VanderBeek
- Department of Ophthalmology, Scheie Eye Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
| |
Collapse
|
10
|
Schmuter G, North VS, Kazim M, Tran AQ. Medical Accuracy of Patient Discussions in Oculoplastic Surgery on Social Media. Ophthalmic Plast Reconstr Surg 2023; 39:132-135. [PMID: 35943417 DOI: 10.1097/iop.0000000000002257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
PURPOSE The aim of this study was to characterize major topics of discussion in oculoplastic surgery on a social media forum and to evaluate the medical accuracy of the content discussed on these platforms. METHODS A cross-sectional analysis of oculoplastics key search terms was performed on 2 active forums (r/PlasticSurgery and r/CosmeticSurgery) on Reddit. The content analysis involved the top posts in Reddit's history from 2008 to 2022. Medical accuracy was determined by actively practicing, board-certified, and fellowship-trained oculoplastic surgeons. RESULTS The most common topics of patient discussions involved inquiring for advice regarding a procedure (44%) and sharing before-and-after photos (34%). The most common topics of patient discussions included providing support, encouragement, or sympathy for a patient (80%) and the cost of a procedure (62%). Misunderstanding of the medical pathophysiology of the patient's condition was seen in 68% of discussions on this social media platform. Medically inaccurate information was seen in 31% of all analyzed statements. When the type of physician performing a given procedure was disclosed, half reported an oculoplastic surgeon performed the surgery. CONCLUSIONS The social media platform, Reddit, is a popular source of advice and information for current and prospective oculoplastic surgery patients. Such social media forums should be used as a sort of psychosocial and psychological support rather than as a primary source of medical information.
Collapse
Affiliation(s)
- Gabriella Schmuter
- Department of Ophthalmology, Edward Harkness Eye Institute, Columbia University Irving Medical Center, New York, New York
- Department of Ophthalmology, Weill Cornell Medicine, New York, New York
| | - Victoria S North
- Department of Ophthalmology, Edward Harkness Eye Institute, Columbia University Irving Medical Center, New York, New York
| | - Michael Kazim
- Department of Ophthalmology, Edward Harkness Eye Institute, Columbia University Irving Medical Center, New York, New York
| | - Ann Q Tran
- Department of Ophthalmology, Edward Harkness Eye Institute, Columbia University Irving Medical Center, New York, New York
- Department of Ophthalmology, University of Illinois Eye and Ear Infirmary, Chicago, Illinois
| |
Collapse
|
11
|
Online symptom checkers lack diagnostic accuracy for skin rashes. J Am Acad Dermatol 2023; 88:487-488. [PMID: 36243544 DOI: 10.1016/j.jaad.2022.06.034] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 05/17/2022] [Accepted: 06/07/2022] [Indexed: 11/06/2022]
|
12
|
Kopka M, Feufel MA, Berner ES, Schmieding ML. How suitable are clinical vignettes for the evaluation of symptom checker apps? A test theoretical perspective. Digit Health 2023; 9:20552076231194929. [PMID: 37614591 PMCID: PMC10444026 DOI: 10.1177/20552076231194929] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 07/28/2023] [Indexed: 08/25/2023] Open
Abstract
Objective To evaluate the ability of case vignettes to assess the performance of symptom checker applications and to suggest refinements to the methodology used in case vignette-based audit studies. Methods We re-analyzed the publicly available data of two prominent case vignette-based symptom checker audit studies by calculating common metrics of test theory. Furthermore, we developed a new metric, the Capability Comparison Score (CCS), which compares symptom checker capability while controlling for the difficulty of the set of cases each symptom checker evaluated. We then scrutinized whether applying test theory and the CCS altered the performance ranking of the investigated symptom checkers. Results In both studies, most symptom checkers changed their rank order when adjusting the triage capability for item difficulty (ID) with the CCS. The previously reported triage accuracies commonly overestimated the capability of symptom checkers because they did not account for the fact that symptom checkers tend to selectively appraise easier cases (i.e., with high ID values). Also, many case vignettes in both studies showed insufficient (very low and even negative) values of item-total correlation (ITC), suggesting that individual items or the composition of item sets are of low quality. Conclusions A test-theoretic perspective helps identify previously undetected threats to the validity of case vignette-based symptom checker assessments and provides guidance and specific metrics to improve the quality of case vignettes, in particular by controlling for the difficulty of the vignettes an app was (not) able to evaluate correctly. Such measures might prove more meaningful than accuracy alone for the competitive assessment of symptom checkers. Our approach helps elaborate and standardize the methodology used for appraising symptom checker capability, which, ultimately, may yield more reliable results.
Collapse
Affiliation(s)
- Marvin Kopka
- Department of Psychology and Ergonomics (IPA), Division of Ergonomics, Technische Universität Berlin, Berlin, Germany
- Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Markus A Feufel
- Department of Psychology and Ergonomics (IPA), Division of Ergonomics, Technische Universität Berlin, Berlin, Germany
| | - Eta S Berner
- Department of Health Services Administration, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Malte L Schmieding
- Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|
13
|
North F, Jensen TB, Stroebel RJ, Nelson EM, Johnson BJ, Thompson MC, Pecina JL, Crum BA. Self-Triage Use, Subsequent Healthcare Utilization, and Diagnoses: A Retrospective Study of Process and Clinical Outcomes Following Self-Triage and Self-Scheduling for Ear or Hearing Symptoms. Health Serv Res Manag Epidemiol 2023; 10:23333928231168121. [PMID: 37101803 PMCID: PMC10123887 DOI: 10.1177/23333928231168121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2023] Open
Abstract
Background Self-triage is becoming more widespread, but little is known about the people who are using online self-triage tools and their outcomes. For self-triage researchers, there are significant barriers to capturing subsequent healthcare outcomes. Our integrated healthcare system was able to capture subsequent healthcare utilization of individuals who used self-triage integrated with self-scheduling of provider visits. Methods We retrospectively examined healthcare utilization and diagnoses after patients had used self-triage and self-scheduling for ear or hearing symptoms. Outcomes and counts of office visits, telemedicine interactions, emergency department visits, and hospitalizations were captured. Diagnosis codes associated with subsequent provider visits were dichotomously categorized as being associated with ear or hearing concerns or not. Nonvisit care encounters of patient-initiated messages, nurse triage calls, and clinical communications were also captured. Results For 2168 self-triage uses, we were able to capture subsequent healthcare encounters within 7 days of the self-triage for 80.5% (1745/2168). In subsequent 1092 office visits with diagnoses, 83.1% (891/1092) of the uses were associated with relevant ear, nose and throat diagnoses. Only 0.24% (4/1662) of patients with captured outcomes were associated with a hospitalization within 7 days. Self-triage resulted in a self-scheduled office visit in 7.2% (126/1745). Office visits resulting from a self-scheduled visit had significantly fewer combined non-visit care encounters per office visit (fewer combined nurse triage calls, patient messages, and clinical communication messages) than office visits that were not self-scheduled (-0.51; 95% CI, -0.72 to -0.29; P < .0001). Conclusion In an appropriate healthcare setting, self-triage outcomes can be captured in a high percentage of uses to examine for safety, patient adherence to recommendations, and efficiency of self-triage. With the ear or hearing self-triage, most uses had subsequent visit diagnoses relevant to ear or hearing, so most patients appeared to be selecting the appropriate self-triage pathway for their symptoms.
Collapse
Affiliation(s)
- Frederick North
- Department of Medicine, Division of Community Internal Medicine, Geriatrics, and Palliative Care, Mayo Clinic, Rochester, MN, USA
- Frederick North, Department of Medicine, Division of Community Internal Medicine, Geriatrics, and Palliative Care, Mayo Clinic, Rochester, MN 55905, USA.
| | - Teresa B Jensen
- Department of Family Medicine, Mayo Clinic, Rochester, MN, USA
| | - Robert J Stroebel
- Department of Medicine, Division of Community Internal Medicine, Geriatrics, and Palliative Care, Mayo Clinic, Rochester, MN, USA
| | - Elissa M Nelson
- Enterprise Office of Access Management, Mayo Clinic, Rochester, MN, USA
| | - Brenda J Johnson
- Enterprise Office of Access Management, Mayo Clinic, Rochester, MN, USA
| | | | | | - Brian A Crum
- Department of Neurology, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
14
|
Müller R, Klemmt M, Ehni HJ, Henking T, Kuhnmünch A, Preiser C, Koch R, Ranisch R. Ethical, legal, and social aspects of symptom checker applications: a scoping review. MEDICINE, HEALTH CARE, AND PHILOSOPHY 2022; 25:737-755. [PMID: 36181620 PMCID: PMC9613552 DOI: 10.1007/s11019-022-10114-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 09/03/2022] [Indexed: 06/16/2023]
Abstract
Symptom Checker Applications (SCA) are mobile applications often designed for the end-user to assist with symptom assessment and self-triage. SCA are meant to provide the user with easily accessible information about their own health conditions. However, SCA raise questions regarding ethical, legal, and social aspects (ELSA), for example, regarding fair access to this new technology. The aim of this scoping review is to identify the ELSA of SCA in the scientific literature. A scoping review was conducted to identify the ELSA of SCA. Ten databases (e.g., Web of Science and PubMed) were used. Studies on SCA that address ELSA, written in English or German, were included in the review. The ELSA of SCA were extracted and synthesized using qualitative content analysis. A total of 25,061 references were identified, of which 39 were included in the analysis. The identified aspects were allotted to three main categories: (1) Technology; (2) Individual Level; and (3) Healthcare system. The results show that there are controversial debates in the literature on the ethical and social challenges of SCA usage. Furthermore, the debates are characterised by a lack of a specific legal perspective and empirical data. The review provides an overview on the spectrum of ELSA regarding SCA. It offers guidance to stakeholders in the healthcare system, for example, patients, healthcare professionals, and insurance providers and could be used in future empirical research to investigate the perspectives of those affected, such as users.
Collapse
Affiliation(s)
- Regina Müller
- Institute of Ethics and History of Medicine, University of Tübingen, Gartenstraße 47, 72074 Tübingen, Germany
| | - Malte Klemmt
- Institute of Applied Social Sciences, University of Applied Sciences Würzburg-Schweinfurt, Münzstraße 12, 97070 Würzburg, Germany
| | - Hans-Jörg Ehni
- Institute of Ethics and History of Medicine, University of Tübingen, Gartenstraße 47, 72074 Tübingen, Germany
| | - Tanja Henking
- Institute of Applied Social Sciences, University of Applied Sciences Würzburg-Schweinfurt, Münzstraße 12, 97070 Würzburg, Germany
| | - Angelina Kuhnmünch
- Institute of Ethics and History of Medicine, University of Tübingen, Gartenstraße 47, 72074 Tübingen, Germany
| | - Christine Preiser
- Institute of Occupational and Social Medicine and Health Services Research, University Hospital Tübingen, Wilhelmstraße 27, 72074 Tübingen, Germany
| | - Roland Koch
- Institute for General Practice and Interprofessional Care, University Medicine Tübingen, Osianderstraße 5, 72076 Tübingen, Germany
| | - Robert Ranisch
- Faculty of Health Sciences Brandenburg, University of Potsdam, Karl-Liebknecht-Str. 24-25, House 16, 14476 Potsdam, Golm, Germany
| |
Collapse
|
15
|
Wallace W, Chan C, Chidambaram S, Hanna L, Iqbal FM, Acharya A, Normahani P, Ashrafian H, Markar SR, Sounderajah V, Darzi A. The diagnostic and triage accuracy of digital and online symptom checker tools: a systematic review. NPJ Digit Med 2022; 5:118. [PMID: 35977992 PMCID: PMC9385087 DOI: 10.1038/s41746-022-00667-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 07/25/2022] [Indexed: 11/09/2022] Open
Abstract
Digital and online symptom checkers are an increasingly adopted class of health technologies that enable patients to input their symptoms and biodata to produce a set of likely diagnoses and associated triage advice. However, concerns regarding the accuracy and safety of these symptom checkers have been raised. This systematic review evaluates the accuracy of symptom checkers in providing diagnoses and appropriate triage advice. MEDLINE and Web of Science were searched for studies that used either real or simulated patients to evaluate online or digital symptom checkers. The primary outcomes were the diagnostic and triage accuracy of the symptom checkers. The QUADAS-2 tool was used to assess study quality. Of the 177 studies retrieved, 10 studies met the inclusion criteria. Researchers evaluated the accuracy of symptom checkers using a variety of medical conditions, including ophthalmological conditions, inflammatory arthritides and HIV. A total of 50% of the studies recruited real patients, while the remainder used simulated cases. The diagnostic accuracy of the primary diagnosis was low across included studies (range: 19–37.9%) and varied between individual symptom checkers, despite consistent symptom data input. Triage accuracy (range: 48.8–90.1%) was typically higher than diagnostic accuracy. Overall, the diagnostic and triage accuracy of symptom checkers are variable and of low accuracy. Given the increasing push towards adopting this class of technologies across numerous health systems, this study demonstrates that reliance upon symptom checkers could pose significant patient safety hazards. Large-scale primary studies, based upon real-world data, are warranted to demonstrate the adequate performance of these technologies in a manner that is non-inferior to current best practices. Moreover, an urgent assessment of how these systems are regulated and implemented is required.
Collapse
Affiliation(s)
- William Wallace
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK
| | - Calvin Chan
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK
| | - Swathikan Chidambaram
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK
| | - Lydia Hanna
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK
| | - Fahad Mujtaba Iqbal
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK.,Institute of Global Health Innovation, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
| | - Amish Acharya
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK.,Institute of Global Health Innovation, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
| | - Pasha Normahani
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK
| | - Hutan Ashrafian
- Institute of Global Health Innovation, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
| | - Sheraz R Markar
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK.,Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden.,Nuffield Department of Surgery, Churchill Hospital, University of Oxford, OX3 7LE, Oxford, UK
| | - Viknesh Sounderajah
- Institute of Global Health Innovation, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK.
| | - Ara Darzi
- Department of Surgery & Cancer, Imperial College London, St. Mary's Hospital, London, W2 1NY, UK.,Institute of Global Health Innovation, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
| |
Collapse
|
16
|
Liu VDM, Kaila M, Koskela T. User initiated symptom assessment with an electronic symptom checker. Study protocol for mixed-methods validation. (Preprint). JMIR Res Protoc 2022. [PMID: 37467041 PMCID: PMC10398552 DOI: 10.2196/41423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023] Open
Abstract
BACKGROUND The national Omaolo digital social welfare and health care service of Finland provides a symptom checker, Omaolo, which is a medical device (based on Duodecim Clinical Decision Support EBMEDS software) with a CE marking (risk class IIa), manufactured by the government-owned DigiFinland Oy. Users of this service can perform their triage by using the questions in the symptom checker. By completing the symptom checker, the user receives a recommendation for action and a service assessment with appropriate guidance regarding their health problems on the basis of a selected specific symptom in the symptom checker. This allows users to be provided with appropriate health care services, regardless of time and place. OBJECTIVE This study describes the protocol for the mixed methods validation process of the symptom checker available in Omaolo digital services. METHODS This is a mixed methods study using quantitative and qualitative methods, which will be part of the clinical validation process that takes place in primary health care centers in Finland. Each organization provides a space where the study and the nurse triage can be done in order to include an unscreened target population of users. The primary health care units provide walk-in model services, where no prior phone call or contact is required. For the validation of the Omaolo symptom checker, case vignettes will be incorporated to supplement the triage accuracy of rare and acute cases that cannot be tested extensively in real-life settings. Vignettes are produced from a variety of clinical sources, and they test the symptom checker in different triage levels by using 1 standardized patient case example. RESULTS This study plan underwent an ethics review by the regional permission, which was requested from each organization participating in the research, and an ethics committee statement was requested and granted from Pirkanmaa hospital district's ethics committee, which is in accordance with the University of Tampere's regulations. Of 964 clinical user-filled symptom checker assessments, 877 cases were fully completed with a triage result, and therefore, they met the requirements for clinical validation studies. The goal for sufficient data has been reached for most of the chief symptoms. Data collection was completed in September 2019, and the first feasibility and patient experience results were published by the end of 2020. Case vignettes have been identified and are to be completed before further testing the symptom checker. The analysis and reporting are estimated to be finalized in 2024. CONCLUSIONS The primary goals of this multimethod electronic symptom checker study are to assess safety and to provide crucial information regarding the accuracy and usability of the Omaolo electronic symptom checker. To our knowledge, this will be the first study to include real-life clinical cases along with case vignettes. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/41423.
Collapse
|
17
|
Nguyen H, Meczner A, Burslam-Dawe K, Hayhoe B. Triage Errors in Primary and Pre-Primary Care. J Med Internet Res 2022; 24:e37209. [PMID: 35749166 PMCID: PMC9270711 DOI: 10.2196/37209] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 03/21/2022] [Accepted: 04/04/2022] [Indexed: 01/20/2023] Open
Abstract
Triage errors are a major concern in health care due to resulting harmful delays in treatments or inappropriate allocation of resources. With the increasing popularity of digital symptom checkers in pre–primary care settings, and amid claims that artificial intelligence outperforms doctors, the accuracy of triage by digital symptom checkers is ever more scrutinized. This paper examines the context and challenges of triage in primary care, pre–primary care, and emergency care, as well as reviews existing evidence on the prevalence of triage errors in all three settings. Implications for development, research, and practice are highlighted, and recommendations are made on how digital symptom checkers should be best positioned.
Collapse
Affiliation(s)
- Hai Nguyen
- Your.MD Ltd, London, United Kingdom.,Health Services and Population Research, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| | | | | | - Benedict Hayhoe
- eConsult Ltd, London, United Kingdom.,Department of Primary Care, School of Public Health, Imperial College London, London, United Kingdom
| |
Collapse
|
18
|
Thorpe D, Fouyaxis J, Lipschitz JM, Nielson A, Li W, Murphy SA, Bidargaddi N. Cost and Effort Considerations for the Development of Intervention Studies Using Mobile Health Platforms: Pragmatic Case Study. JMIR Form Res 2022; 6:e29988. [PMID: 35357313 PMCID: PMC9015742 DOI: 10.2196/29988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 12/02/2021] [Accepted: 01/14/2022] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The research marketplace has seen a flood of open-source or commercial mobile health (mHealth) platforms that can collect and use user data in real time. However, there is a lack of practical literature on how these platforms are developed, integrated into study designs, and adopted, including important information around cost and effort considerations. OBJECTIVE We intend to build critical literacy in the clinician-researcher readership into the cost, effort, and processes involved in developing and operationalizing an mHealth platform, focusing on Intui, an mHealth platform that we developed. METHODS We describe the development of the Intui mHealth platform and general principles of its operationalization across sites. RESULTS We provide a worked example in the form of a case study. Intui was operationalized in the design of a behavioral activation intervention in collaboration with a mental health service provider. We describe the design specifications of the study site, the developed software, and the cost and effort required to build the final product. CONCLUSIONS Study designs, researcher needs, and technical considerations can impact effort and costs associated with the use of mHealth platforms. Greater transparency from platform developers about the impact of these factors on practical considerations relevant to end users such as clinician-researchers is crucial to increasing critical literacy around mHealth, thereby aiding in the widespread use of these potentially beneficial technologies and building clinician confidence in these tools.
Collapse
Affiliation(s)
- Dan Thorpe
- Flinders Digital Health Research Lab, College of Medicine and Public Health, Flinders University, Clovelly Park, Australia
| | - John Fouyaxis
- Flinders Digital Health Research Lab, College of Medicine and Public Health, Flinders University, Clovelly Park, Australia
| | | | - Amy Nielson
- Flinders Digital Health Research Lab, College of Medicine and Public Health, Flinders University, Clovelly Park, Australia
| | - Wenhao Li
- Flinders Digital Health Research Lab, College of Medicine and Public Health, Flinders University, Clovelly Park, Australia
| | - Susan A Murphy
- Radcliffe Institute, Harvard University, Boston, MA, United States
| | - Niranjan Bidargaddi
- Flinders Digital Health Research Lab, College of Medicine and Public Health, Flinders University, Clovelly Park, Australia
| |
Collapse
|
19
|
Cotte F, Mueller T, Gilbert S, Blümke B, Multmeier J, Hirsch MC, Wicks P, Wolanski J, Tutschkow D, Schade Brittinger C, Timmermann L, Jerrentrup A. Safety of Triage Self-assessment Using a Symptom Assessment App for Walk-in Patients in the Emergency Care Setting: Observational Prospective Cross-sectional Study. JMIR Mhealth Uhealth 2022; 10:e32340. [PMID: 35343909 PMCID: PMC9002590 DOI: 10.2196/32340] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 12/17/2021] [Accepted: 02/18/2022] [Indexed: 01/29/2023] Open
Abstract
Background Increasing use of emergency departments (EDs) by patients with low urgency, combined with limited availability of medical staff, results in extended waiting times and delayed care. Technological approaches could possibly increase efficiency by providing urgency advice and symptom assessments. Objective The purpose of this study is to evaluate the safety of urgency advice provided by a symptom assessment app, Ada, in an ED. Methods The study was conducted at the interdisciplinary ED of Marburg University Hospital, with data collection performed between August 2019 and March 2020. This study had a single-center cross-sectional prospective observational design and included 378 patients. The app’s urgency recommendation was compared with an established triage concept (Manchester Triage System [MTS]), including patients from the lower 3 MTS categories only. For all patients who were undertriaged, an expert physician panel assessed the case to detect potential avoidable hazardous situations (AHSs). Results Of 378 participants, 344 (91%) were triaged the same or more conservatively and 34 (8.9%) were undertriaged by the app. Of the 378 patients, 14 (3.7%) had received safe advice determined by the expert panel and 20 (5.3%) were considered to be potential AHS. Therefore, the assessment could be considered safe in 94.7% (358/378) of the patients when compared with the MTS assessment. From the 3 lowest MTS categories, 43.4% (164/378) of patients were not considered as emergency cases by the app, but could have been safely treated by a general practitioner or would not have required a physician consultation at all. Conclusions The app provided urgency advice after patient self-triage that has a high rate of safety, a rate of undertriage, and a rate of triage with potential to be an AHS, equivalent to telephone triage by health care professionals while still being more conservative than direct ED triage. A large proportion of patients in the ED were not considered as emergency cases, which could possibly relieve ED burden if used at home. Further research should be conducted in the at-home setting to evaluate this hypothesis. Trial Registration German Clinical Trial Registration DRKS00024909; https://www.drks.de/drks_web/navigate.do? navigationId=trial.HTML&TRIAL_ID=DRKS00024909
Collapse
Affiliation(s)
- Fabienne Cotte
- Charité Universitäsmedizin Berlin, Berlin, Germany.,Department of Emergency Medicine, University Clinic Marburg, Philipps-University, Marburg, Germany.,Ada Health GmbH, Berlin, Germany
| | - Tobias Mueller
- Center for Unknown and Rare Diseases, UKGM GmbH, University Clinic Marburg, Philipps-University, Marburg, Germany
| | - Stephen Gilbert
- Ada Health GmbH, Berlin, Germany.,Else Kröner Fresenius Center for Digital Health, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | | | | | - Martin Christian Hirsch
- Ada Health GmbH, Berlin, Germany.,Institute of Artificial Intelligence, Philipps-University Marburg, Marburg, Germany
| | | | | | - Darja Tutschkow
- Coordinating Center for Clinical Trials, Philipps University Marburg, Marburg, Germany, Marburg, Germany
| | - Carmen Schade Brittinger
- Coordinating Center for Clinical Trials, Philipps University Marburg, Marburg, Germany, Marburg, Germany
| | - Lars Timmermann
- Department of Neurology, University Hospital of Marburg, Marburg, Germany
| | - Andreas Jerrentrup
- Department of Emergency Medicine, University Clinic Marburg, Philipps-University, Marburg, Germany
| |
Collapse
|
20
|
Hennemann S, Kuhn S, Witthöft M, Jungmann SM. Diagnostic Performance of an App-Based Symptom Checker in Mental Disorders: Comparative Study in Psychotherapy Outpatients. JMIR Ment Health 2022; 9:e32832. [PMID: 35099395 PMCID: PMC8844983 DOI: 10.2196/32832] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 11/09/2021] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Digital technologies have become a common starting point for health-related information-seeking. Web- or app-based symptom checkers aim to provide rapid and accurate condition suggestions and triage advice but have not yet been investigated for mental disorders in routine health care settings. OBJECTIVE This study aims to test the diagnostic performance of a widely available symptom checker in the context of formal diagnosis of mental disorders when compared with therapists' diagnoses based on structured clinical interviews. METHODS Adult patients from an outpatient psychotherapy clinic used the app-based symptom checker Ada-check your health (ADA; Ada Health GmbH) at intake. Accuracy was assessed as the agreement of the first and 1 of the first 5 condition suggestions of ADA with at least one of the interview-based therapist diagnoses. In addition, sensitivity, specificity, and interrater reliabilities (Gwet first-order agreement coefficient [AC1]) were calculated for the 3 most prevalent disorder categories. Self-reported usability (assessed using the System Usability Scale) and acceptance of ADA (assessed using an adapted feedback questionnaire) were evaluated. RESULTS A total of 49 patients (30/49, 61% women; mean age 33.41, SD 12.79 years) were included in this study. Across all patients, the interview-based diagnoses matched ADA's first condition suggestion in 51% (25/49; 95% CI 37.5-64.4) of cases and 1 of the first 5 condition suggestions in 69% (34/49; 95% CI 55.4-80.6) of cases. Within the main disorder categories, the accuracy of ADA's first condition suggestion was 0.82 for somatoform and associated disorders, 0.65 for affective disorders, and 0.53 for anxiety disorders. Interrater reliabilities ranged from low (AC1=0.15 for anxiety disorders) to good (AC1=0.76 for somatoform and associated disorders). The usability of ADA was rated as high in the System Usability Scale (mean 81.51, SD 11.82, score range 0-100). Approximately 71% (35/49) of participants would have preferred a face-to-face over an app-based diagnostic. CONCLUSIONS Overall, our findings suggest that a widely available symptom checker used in the formal diagnosis of mental disorders could provide clinicians with a list of condition suggestions with moderate-to-good accuracy. However, diagnostic performance was heterogeneous between disorder categories and included low interrater reliability. Although symptom checkers have some potential to complement the diagnostic process as a screening tool, the diagnostic performance should be tested in larger samples and in comparison with further diagnostic instruments.
Collapse
Affiliation(s)
- Severin Hennemann
- Department of Clinical Psychology, Psychotherapy and Experimental Psychopathology, University of Mainz, Mainz, Germany
| | - Sebastian Kuhn
- Department of Digital Medicine, Medical Faculty OWL, Bielefeld University, Bielefeld, Germany
| | - Michael Witthöft
- Department of Clinical Psychology, Psychotherapy and Experimental Psychopathology, University of Mainz, Mainz, Germany
| | - Stefanie M Jungmann
- Department of Clinical Psychology, Psychotherapy and Experimental Psychopathology, University of Mainz, Mainz, Germany
| |
Collapse
|
21
|
Hwang JC, Yannuzzi NA, Cavuoto KM, Ansari Z, Patel NA, Goodman CF, Lang S, Sridhar J. Utilization of Online Resources by Patients in an Ophthalmic Emergency Department. JOURNAL OF ACADEMIC OPHTHALMOLOGY 2021. [DOI: 10.1055/s-0040-1722310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Abstract
Abstract
Objective To describe the utilization of online resources by patients prior to presentation to an ophthalmic emergency department (ED) and to assess the accuracy of online resources for ophthalmic diagnoses.
Methods This is a prospective survey of patients presenting to an ophthalmic ED for initial evaluation of ocular symptoms. Prior to evaluation, patients completed surveys assessing ocular symptoms, Internet usage, and presumed self-diagnoses. Demographics and characteristics of Internet usage were determined. Accuracy of self-diagnoses was compared between Internet users and nonusers. Diagnoses were classified as high or low acuity based on agreement between senior authors.
Results A total of 144 patients completed surveys. Mean (standard deviation) age was 53.2 years (18.0). One-third of patients used the Internet for health-related searches prior to presentation. Internet users were younger compared with nonusers (48.2 years [16.5] vs. 55.5 years [18.3], p = 0.02). There were no differences in sex, ethnicity, or race. Overall, there was a threefold difference in proportion of patients correctly predicting their diagnoses, with Internet users correctly predicting their diagnoses more often than nonusers (41 vs. 13%, p < 0.001). When excluding cases of known trauma, the difference in proportion increased to fivefold (Internet users 40% vs. nonusers 8%, p < 0.001). Upon classification by acuity level, Internet users demonstrated greater accuracy than nonusers for both high- (42 vs. 17%, p = 0.03) and low (41 vs. 10%, p = 0.001)-acuity diagnoses. Greatest accuracy was in cases of external lid conditions such as chalazia and hordeola (100% [4/4] of Internet users vs. 40% (2/5) of nonusers), conjunctivitis (43% [3/7] of Internet users vs. 25% [2/8] of nonusers), and retinal traction or detachments (57% [4/7] of Internet users vs. 0% [0/4] of nonusers). The most frequently visited Web sites were Google (82%) and WebMD (40%). Patient accuracy did not change according to the number of Web sites visited, but patients who visited the Mayo Clinic Web site had greater accuracy compared with those who visited other Web sites (89 vs. 30%, p = 0.003).
Conclusion Patients with ocular symptoms may seek medical information on the Internet before evaluation by a physician in an ophthalmic ED. Online resources may improve the accuracy of patient self-diagnosis for low- and high-acuity diagnoses.
Collapse
Affiliation(s)
- Jodi C. Hwang
- University of Miami Miller School of Medicine, Miami, Florida
| | - Nicolas A. Yannuzzi
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami, Miami, Florida
| | - Kara M. Cavuoto
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami, Miami, Florida
| | - Zubair Ansari
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami, Miami, Florida
| | - Nimesh A. Patel
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami, Miami, Florida
| | | | - Steven Lang
- University of Miami Miller School of Medicine, Miami, Florida
| | - Jayanth Sridhar
- Department of Ophthalmology, Bascom Palmer Eye Institute, University of Miami, Miami, Florida
| |
Collapse
|
22
|
Ceney A, Tolond S, Glowinski A, Marks B, Swift S, Palser T. Accuracy of online symptom checkers and the potential impact on service utilisation. PLoS One 2021; 16:e0254088. [PMID: 34265845 PMCID: PMC8282353 DOI: 10.1371/journal.pone.0254088] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 06/13/2021] [Indexed: 02/03/2023] Open
Abstract
OBJECTIVES The aims of our study are firstly to investigate the diagnostic and triage performance of symptom checkers, secondly to assess their potential impact on healthcare utilisation and thirdly to investigate for variation in performance between systems. SETTING Publicly available symptom checkers for patient use. PARTICIPANTS Publicly available symptom-checkers were identified. A standardised set of 50 clinical vignettes were developed and systematically run through each system by a non-clinical researcher. PRIMARY AND SECONDARY OUTCOME MEASURES System accuracy was assessed by measuring the percentage of times the correct diagnosis was a) listed first, b) within the top five diagnoses listed and c) listed at all. The safety of the disposition advice was assessed by comparing it with national guidelines for each vignette. RESULTS Twelve tools were identified and included. Mean diagnostic accuracy of the systems was poor, with the correct diagnosis being present in the top five diagnoses on 51.0% (Range 22.2 to 84.0%). Safety of disposition advice decreased with condition urgency (being 71.8% for emergency cases vs 87.3% for non-urgent cases). 51.0% of systems suggested additional resource utilisation above that recommended by national guidelines (range 18.0% to 61.2%). Both diagnostic accuracy and appropriate resource recommendation varied substantially between systems. CONCLUSIONS There is wide variation in performance between available symptom checkers and overall performance is significantly below what would be accepted in any other medical field, though some do achieve a good level of accuracy and safety of disposition. External validation and regulation are urgently required to ensure these public facing tools are safe.
Collapse
Affiliation(s)
- Adam Ceney
- Methods Analytics Ltd, Sheffield, United Kingdom
| | | | | | - Ben Marks
- Methods Analytics Ltd, Sheffield, United Kingdom
| | - Simon Swift
- Methods Analytics Ltd, Sheffield, United Kingdom
- University of Exeter Business School (INDEX), Exeter, United Kingdom
| | - Tom Palser
- Methods Analytics Ltd, Sheffield, United Kingdom
- Department of Surgery, University Hospitals of Leicester NHS Trust, Leicester, United Kingdom
- SAPPHIRE, Department of Health Sciences, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
23
|
Gilbert S, Mehl A, Baluch A, Cawley C, Challiner J, Fraser H, Millen E, Montazeri M, Multmeier J, Pick F, Richter C, Türk E, Upadhyay S, Virani V, Vona N, Wicks P, Novorol C. How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs. BMJ Open 2020; 10:e040269. [PMID: 33328258 PMCID: PMC7745523 DOI: 10.1136/bmjopen-2020-040269] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
OBJECTIVES To compare breadth of condition coverage, accuracy of suggested conditions and appropriateness of urgency advice of eight popular symptom assessment apps. DESIGN Vignettes study. SETTING 200 primary care vignettes. INTERVENTION/COMPARATOR For eight apps and seven general practitioners (GPs): breadth of coverage and condition-suggestion and urgency advice accuracy measured against the vignettes' gold-standard. PRIMARY OUTCOME MEASURES (1) Proportion of conditions 'covered' by an app, that is, not excluded because the user was too young/old or pregnant, or not modelled; (2) proportion of vignettes with the correct primary diagnosis among the top 3 conditions suggested; (3) proportion of 'safe' urgency advice (ie, at gold standard level, more conservative, or no more than one level less conservative). RESULTS Condition-suggestion coverage was highly variable, with some apps not offering a suggestion for many users: in alphabetical order, Ada: 99.0%; Babylon: 51.5%; Buoy: 88.5%; K Health: 74.5%; Mediktor: 80.5%; Symptomate: 61.5%; Your.MD: 64.5%; WebMD: 93.0%. Top-3 suggestion accuracy was GPs (average): 82.1%±5.2%; Ada: 70.5%; Babylon: 32.0%; Buoy: 43.0%; K Health: 36.0%; Mediktor: 36.0%; Symptomate: 27.5%; WebMD: 35.5%; Your.MD: 23.5%. Some apps excluded certain user demographics or conditions and their performance was generally greater with the exclusion of corresponding vignettes. For safe urgency advice, tested GPs had an average of 97.0%±2.5%. For the vignettes with advice provided, only three apps had safety performance within 1 SD of the GPs-Ada: 97.0%; Babylon: 95.1%; Symptomate: 97.8%. One app had a safety performance within 2 SDs of GPs-Your.MD: 92.6%. Three apps had a safety performance outside 2 SDs of GPs-Buoy: 80.0% (p<0.001); K Health: 81.3% (p<0.001); Mediktor: 87.3% (p=1.3×10-3). CONCLUSIONS The utility of digital symptom assessment apps relies on coverage, accuracy and safety. While no digital tool outperformed GPs, some came close, and the nature of iterative improvements to software offers scalable improvements to care.
Collapse
Affiliation(s)
| | | | | | | | | | - Hamish Fraser
- Brown Center for Biomedical Informatics, Brown University, Rhode Island, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Abstract
OBJECTIVES We investigated the usefulness of machine learning artificial intelligence (AI) in classifying the severity of ophthalmic emergency for timely hospital visits. STUDY DESIGN This retrospective study analysed the patients who first visited the Armed Forces Daegu Hospital between May and December 2019. General patient information, events and symptoms were input variables. Events, symptoms, diagnoses and treatments were output variables. The output variables were classified into four classes (red, orange, yellow and green, indicating immediate to no emergency cases). About 200 cases of the class-balanced validation data set were randomly selected before all training procedures. An ensemble AI model using combinations of fully connected neural networks with the synthetic minority oversampling technique algorithm was adopted. PARTICIPANTS A total of 1681 patients were included. MAJOR OUTCOMES Model performance was evaluated using accuracy, precision, recall and F1 scores. RESULTS The accuracy of the model was 99.05%. The precision of each class (red, orange, yellow and green) was 100%, 98.10%, 92.73% and 100%. The recalls of each class were 100%, 100%, 98.08% and 95.33%. The F1 scores of each class were 100%, 99.04%, 95.33% and 96.00%. CONCLUSIONS We provided support for an AI method to classify ophthalmic emergency severity based on symptoms.
Collapse
Affiliation(s)
- Hyunmin Ahn
- Ophthalmology, Armed Forces Daegu Hospital, Daegu, Korea (the Republic of)
| |
Collapse
|
25
|
Chishti S, Jaggi KR, Saini A, Agarwal G, Ranjan A. Artificial Intelligence-Based Differential Diagnosis: Development and Validation of a Probabilistic Model to Address Lack of Large-Scale Clinical Datasets. J Med Internet Res 2020; 22:e17550. [PMID: 32343256 PMCID: PMC7218591 DOI: 10.2196/17550] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 01/30/2020] [Accepted: 02/01/2020] [Indexed: 12/19/2022] Open
Abstract
Background Machine-learning or deep-learning algorithms for clinical diagnosis are inherently dependent on the availability of large-scale clinical datasets. Lack of such datasets and inherent problems such as overfitting often necessitate the development of innovative solutions. Probabilistic modeling closely mimics the rationale behind clinical diagnosis and represents a unique solution. Objective The aim of this study was to develop and validate a probabilistic model for differential diagnosis in different medical domains. Methods Numerical values of symptom-disease associations were utilized to mathematically represent medical domain knowledge. These values served as the core engine for the probabilistic model. For the given set of symptoms, the model was utilized to produce a ranked list of differential diagnoses, which was compared to the differential diagnosis constructed by a physician in a consult. Practicing medical specialists were integral in the development and validation of this model. Clinical vignettes (patient case studies) were utilized to compare the accuracy of doctors and the model against the assumed gold standard. The accuracy analysis was carried out over the following metrics: top 3 accuracy, precision, and recall. Results The model demonstrated a statistically significant improvement (P=.002) in diagnostic accuracy (85%) as compared to the doctors’ performance (67%). This advantage was retained across all three categories of clinical vignettes: 100% vs 82% (P<.001) for highly specific disease presentation, 83% vs 65% for moderately specific disease presentation (P=.005), and 72% vs 49% (P<.001) for nonspecific disease presentation. The model performed slightly better than the doctors’ average in precision (62% vs 60%, P=.43) but there was no improvement with respect to recall (53% vs 56%, P=.27). However, neither difference was statistically significant. Conclusions The present study demonstrates a drastic improvement over previously reported results that can be attributed to the development of a stable probabilistic framework utilizing symptom-disease associations to mathematically represent medical domain knowledge. The current iteration relies on static, manually curated values for calculating the degree of association. Shifting to real-world data–derived values represents the next step in model development.
Collapse
Affiliation(s)
| | | | - Anuj Saini
- 1mg Technologies Pvt Ltd, Gurgaon, India
| | | | | |
Collapse
|
26
|
Stans J. A brief overview of animal symptom checkers. Open Vet J 2020; 10:1-3. [PMID: 32426249 PMCID: PMC7193881 DOI: 10.4314/ovj.v10i1.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Accepted: 12/18/2019] [Indexed: 12/02/2022] Open
Abstract
Symptom checkers are tools that provide health information, including possible conditions, after entering one or more symptoms. Some symptom checkers also provide advice on how urgent medical attention should be sought. In addition to human symptom checkers, several tools are also available to check the symptoms of animals and provide veterinary triage advice. Unlike in human symptom checkers, this widespread availability has not lead to investigations of these tools. Indeed, little to no peer-reviewed research has been published regarding animal symptom checkers. This paper aims to describe some examples of animal symptom checkers. In addition, the proposals for future research are formulated by translating knowledge obtained from research into human symptom checkers.
Collapse
Affiliation(s)
- Jelle Stans
- Institute for Globally Distributed Open Research and Education (IGDORE). Independent, Beringen, Belgium
| |
Collapse
|