Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kämmer JE, Schauber SK, Hautz SC, Stroben F, Hautz WE. Differential diagnosis checklists reduce diagnostic error differentially: A randomised experiment. Med Educ 2021;55:1172-1182. [PMID: 34291481 PMCID: PMC9290564 DOI: 10.1111/medu.14596] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 07/13/2021] [Indexed: 05/30/2023]

For:	Kämmer JE, Schauber SK, Hautz SC, Stroben F, Hautz WE. Differential diagnosis checklists reduce diagnostic error differentially: A randomised experiment. Med Educ 2021;55:1172-1182. [PMID: 34291481 PMCID: PMC9290564 DOI: 10.1111/medu.14596] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 07/13/2021] [Indexed: 05/30/2023]

Number

Cited by Other Article(s)

Harada Y, Sakamoto T, Sugimoto S, Shimizu T. Longitudinal Changes in Diagnostic Accuracy of a Differential Diagnosis List Developed by an AI-Based Symptom Checker: Retrospective Observational Study. JMIR Form Res 2024;8:e53985. [PMID: 38758588 PMCID: PMC11143391 DOI: 10.2196/53985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 03/23/2024] [Accepted: 04/24/2024] [Indexed: 05/18/2024] Open

Abstract

BACKGROUND

Artificial intelligence (AI) symptom checker models should be trained using real-world patient data to improve their diagnostic accuracy. Given that AI-based symptom checkers are currently used in clinical practice, their performance should improve over time. However, longitudinal evaluations of the diagnostic accuracy of these symptom checkers are limited.

OBJECTIVE

This study aimed to assess the longitudinal changes in the accuracy of differential diagnosis lists created by an AI-based symptom checker used in the real world.

METHODS

This was a single-center, retrospective, observational study. Patients who visited an outpatient clinic without an appointment between May 1, 2019, and April 30, 2022, and who were admitted to a community hospital in Japan within 30 days of their index visit were considered eligible. We only included patients who underwent an AI-based symptom checkup at the index visit, and the diagnosis was finally confirmed during follow-up. Final diagnoses were categorized as common or uncommon, and all cases were categorized as typical or atypical. The primary outcome measure was the accuracy of the differential diagnosis list created by the AI-based symptom checker, defined as the final diagnosis in a list of 10 differential diagnoses created by the symptom checker. To assess the change in the symptom checker's diagnostic accuracy over 3 years, we used a chi-square test to compare the primary outcome over 3 periods: from May 1, 2019, to April 30, 2020 (first year); from May 1, 2020, to April 30, 2021 (second year); and from May 1, 2021, to April 30, 2022 (third year).

RESULTS

A total of 381 patients were included. Common diseases comprised 257 (67.5%) cases, and typical presentations were observed in 298 (78.2%) cases. Overall, the accuracy of the differential diagnosis list created by the AI-based symptom checker was 172 (45.1%), which did not differ across the 3 years (first year: 97/219, 44.3%; second year: 32/72, 44.4%; and third year: 43/90, 47.7%; P=.85). The accuracy of the differential diagnosis list created by the symptom checker was low in those with uncommon diseases (30/124, 24.2%) and atypical presentations (12/83, 14.5%). In the multivariate logistic regression model, common disease (P<.001; odds ratio 4.13, 95% CI 2.50-6.98) and typical presentation (P<.001; odds ratio 6.92, 95% CI 3.62-14.2) were significantly associated with the accuracy of the differential diagnosis list created by the symptom checker.

CONCLUSIONS

A 3-year longitudinal survey of the diagnostic accuracy of differential diagnosis lists developed by an AI-based symptom checker, which has been implemented in real-world clinical practice settings, showed no improvement over time. Uncommon diseases and atypical presentations were independently associated with a lower diagnostic accuracy. In the future, symptom checkers should be trained to recognize uncommon conditions.

Collapse

Blanchard MD, Herzog SM, Kämmer JE, Zöller N, Kostopoulou O, Kurvers RHJM. Collective Intelligence Increases Diagnostic Accuracy in a General Practice Setting. Med Decis Making 2024;44:451-462. [PMID: 38606597 PMCID: PMC11102639 DOI: 10.1177/0272989x241241001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 02/28/2024] [Indexed: 04/13/2024]

Abstract

BACKGROUND

General practitioners (GPs) work in an ill-defined environment where diagnostic errors are prevalent. Previous research indicates that aggregating independent diagnoses can improve diagnostic accuracy in a range of settings. We examined whether aggregating independent diagnoses can also improve diagnostic accuracy for GP decision making. In addition, we investigated the potential benefit of such an approach in combination with a decision support system (DSS).

METHODS

We simulated virtual groups using data sets from 2 previously published studies. In study 1, 260 GPs independently diagnosed 9 patient cases in a vignette-based study. In study 2, 30 GPs independently diagnosed 12 patient actors in a patient-facing study. In both data sets, GPs provided diagnoses in a control condition and/or DSS condition(s). Each GP's diagnosis, confidence rating, and years of experience were entered into a computer simulation. Virtual groups of varying sizes (range: 3-9) were created, and different collective intelligence rules (plurality, confidence, and seniority) were applied to determine each group's final diagnosis. Diagnostic accuracy was used as the performance measure.

RESULTS

Aggregating independent diagnoses by weighing them equally (i.e., the plurality rule) substantially outperformed average individual accuracy, and this effect increased with increasing group size. Selecting diagnoses based on confidence only led to marginal improvements, while selecting based on seniority reduced accuracy. Combining the plurality rule with a DSS further boosted performance.

DISCUSSION

Combining independent diagnoses may substantially improve a GP's diagnostic accuracy and subsequent patient outcomes. This approach did, however, not improve accuracy in all patient cases. Therefore, future work should focus on uncovering the conditions under which collective intelligence is most beneficial in general practice.

HIGHLIGHTS

We examined whether aggregating independent diagnoses of GPs can improve diagnostic accuracy.Using data sets of 2 previously published studies, we composed virtual groups of GPs and combined their independent diagnoses using 3 collective intelligence rules (plurality, confidence, and seniority).Aggregating independent diagnoses by weighing them equally substantially outperformed average individual GP accuracy, and this effect increased with increasing group size.Combining independent diagnoses may substantially improve GP's diagnostic accuracy and subsequent patient outcomes.

Collapse

Hajibonabi F, Sharma P, Davarpanah AH, Balthazar P, Moreno CC, Pectasides M, Nandwana SB. Performing Quality Control on Magnetic Resonance Imaging Liver Fat/Iron Quantification Studies: A Critical Requirement. J Comput Assist Tomogr 2023;47:689-697. [PMID: 37707397 DOI: 10.1097/rct.0000000000001471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]

Abstract

OBJECTIVE

Nonalcoholic fatty liver and iron overload can lead to cirrhosis requiring early detection. Magnetic resonance (MR) imaging utilizing chemical shift-encoded sequences and multi-Time of Echo single-voxel spectroscopy (SVS) are frequently used for assessment. The purpose of this study was to assess various quality factors of technical acceptability and any deficiencies in technologist performance in these fat/iron MR quantification studies.

METHODS

Institutional review board waived retrospective quality improvement review of 87 fat/iron MR studies performed over a 6-month period was evaluated. Technical acceptability/unacceptability for chemical shift-encoded sequences (q-Dixon and IDEAL-IQ) included data handling errors (missing maps), liver field coverage, fat/water swap, motion, or other artifacts. Similarly, data handling (missing table/spectroscopy), curve-fit, fat- and water-peak separation, and water-peak sharpness were evaluated for SVS technical acceptability.

RESULTS

Data handling errors were found in 11% (10/87) of studies with missing maps or entire sequence (SVS or q-Dixon). Twenty-seven percent (23/86) of the q-Dixon/IDEAL-IQ were technically unacceptable (incomplete liver-field [39%], other artifacts [35%], significant/severe motion [18%], global fat/water swap [4%], and multiple reasons [4%]). Twenty-eight percent (21/75) of SVS sequences were unacceptable (water-peak broadness [67%], poor curve-fit [19%] overlapping fat and water peaks [5%], and multiple reasons [9%]).

CONCLUSIONS

A high rate of preventable errors in fat/iron MR quantification studies indicates the need for routine quality control and evaluation of technologist performance and technical deficiencies that may exist within a radiology practice. Potential solutions such as instituting a checklist for technologists during each acquisition procedure and routine auditing may be required.

Collapse

Kafke SD, Kuhlmey A, Schuster J, Blüher S, Czimmeck C, Zoellick JC, Grosse P. Can clinical decision support systems be an asset in medical education? An experimental approach. BMC MEDICAL EDUCATION 2023;23:570. [PMID: 37568144 PMCID: PMC10416486 DOI: 10.1186/s12909-023-04568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 08/04/2023] [Indexed: 08/13/2023]

Abstract

BACKGROUND

Diagnostic accuracy is one of the major cornerstones of appropriate and successful medical decision-making. Clinical decision support systems (CDSSs) have recently been used to facilitate physician's diagnostic considerations. However, to date, little is known about the potential assets of CDSS for medical students in an educational setting. The purpose of our study was to explore the usefulness of CDSSs for medical students assessing their diagnostic performances and the influence of such software on students' trust in their own diagnostic abilities.

METHODS

Based on paper cases students had to diagnose two different patients using a CDSS and conventional methods such as e.g. textbooks, respectively. Both patients had a common disease, in one setting the clinical presentation was a typical one (tonsillitis), in the other setting (pulmonary embolism), however, the patient presented atypically. We used a 2x2x2 between- and within-subjects cluster-randomised controlled trial to assess the diagnostic accuracy in medical students, also by changing the order of the used resources (CDSS first or second).

RESULTS

Medical students in their 4th and 5th year performed equally well using conventional methods or the CDSS across the two cases (t(164) = 1,30; p = 0.197). Diagnostic accuracy and trust in the correct diagnosis were higher in the typical presentation condition than in the atypical presentation condition (t(85) = 19.97; p < .0001 and t(150) = 7.67; p < .0001).These results refute our main hypothesis that students diagnose more accurately when using conventional methods compared to the CDSS.

CONCLUSIONS

Medical students in their 4th and 5th year performed equally well in diagnosing two cases of common diseases with typical or atypical clinical presentations using conventional methods or a CDSS. Students were proficient in diagnosing a common disease with a typical presentation but underestimated their own factual knowledge in this scenario. Also, students were aware of their own diagnostic limitations when presented with a challenging case with an atypical presentation for which the use of a CDSS seemingly provided no additional insights.

Collapse

Harada Y, Tomiyama S, Sakamoto T, Sugimoto S, Kawamura R, Yokose M, Hayashi A, Shimizu T. Effects of Combinational Use of Additional Differential Diagnostic Generators on the Diagnostic Accuracy of the Differential Diagnosis List Developed by an Artificial Intelligence-Driven Automated History-Taking System: Pilot Cross-Sectional Study. JMIR Form Res 2023;7:e49034. [PMID: 37531164 PMCID: PMC10433017 DOI: 10.2196/49034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/23/2023] [Accepted: 07/19/2023] [Indexed: 08/03/2023] Open

Abstract

BACKGROUND

Low diagnostic accuracy is a major concern in automated medical history-taking systems with differential diagnosis (DDx) generators. Extending the concept of collective intelligence to the field of DDx generators such that the accuracy of judgment becomes higher when accepting an integrated diagnosis list from multiple people than when accepting a diagnosis list from a single person may be a possible solution.

OBJECTIVE

The purpose of this study is to assess whether the combined use of several DDx generators improves the diagnostic accuracy of DDx lists.

METHODS

We used medical history data and the top 10 DDx lists (index DDx lists) generated by an artificial intelligence (AI)-driven automated medical history-taking system from 103 patients with confirmed diagnoses. Two research physicians independently created the other top 10 DDx lists (second and third DDx lists) per case by imputing key information into the other 2 DDx generators based on the medical history generated by the automated medical history-taking system without reading the index lists generated by the automated medical history-taking system. We used the McNemar test to assess the improvement in diagnostic accuracy from the index DDx lists to the three types of combined DDx lists: (1) simply combining DDx lists from the index, second, and third lists; (2) creating a new top 10 DDx list using a 1/n weighting rule; and (3) creating new lists with only shared diagnoses among DDx lists from the index, second, and third lists. We treated the data generated by 2 research physicians from the same patient as independent cases. Therefore, the number of cases included in analyses in the case using 2 additional lists was 206 (103 cases × 2 physicians' input).

RESULTS

The diagnostic accuracy of the index lists was 46% (47/103). Diagnostic accuracy was improved by simply combining the other 2 DDx lists (133/206, 65%, P<.001), whereas the other 2 combined DDx lists did not improve the diagnostic accuracy of the DDx lists (106/206, 52%, P=.05 in the collective list with the 1/n weighting rule and 29/206, 14%, P<.001 in the only shared diagnoses among the 3 DDx lists).

CONCLUSIONS

Simply adding each of the top 10 DDx lists from additional DDx generators increased the diagnostic accuracy of the DDx list by approximately 20%, suggesting that the combinational use of DDx generators early in the diagnostic process is beneficial.

Collapse

Staal J, Zegers R, Caljouw-Vos J, Mamede S, Zwaan L. Impact of diagnostic checklists on the interpretation of normal and abnormal electrocardiograms. Diagnosis (Berl) 2023;10:121-129. [PMID: 36490202 DOI: 10.1515/dx-2022-0092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 11/27/2022] [Indexed: 12/13/2022]

Mamede S, Schmidt HG. Deliberate reflection and clinical reasoning: Founding ideas and empirical findings. MEDICAL EDUCATION 2023;57:76-85. [PMID: 35771936 PMCID: PMC10083910 DOI: 10.1111/medu.14863] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 06/15/2022] [Accepted: 06/26/2022] [Indexed: 06/15/2023]

Abstract

CONTEXT

The idea that reflection improves reasoning and learning, since long present in other fields, emerged in the 90s in the medical education literature. Since then, the number of publications on reflection as a means to improve diagnostic learning and clinical problem-solving has increased steeply. Recently, concerns with diagnostic errors have raised further interest in reflection. Several approaches based on reflection have been proposed to reduce clinicians' errors during diagnostic reasoning. What reflection entails varies substantially, and most approaches still require empirical examination.

PURPOSE

The present essay aims to help clarify the role of deliberate reflection in diagnostic reasoning. Deliberate reflection is an approach whose effects on diagnostic reasoning and learning have been empirically studied over the past 15 years. The philosophical roots of the approach will be briefly examined, and the theory and practice of deliberate reflection, particularly its effectiveness, will be reviewed. Lessons learned and unresolved issues will be discussed.

DISCUSSION

The deliberate reflection approach originated from a conceptualization of the nature of reflection practice in medicine informed by Dewey's and Schön's work. The approach guides physicians through systematically reviewing the grounds of their initial diagnosis and considering alternatives. Experimental evidence has supported the effectiveness of deliberate reflection in increasing physicians' diagnostic performance, particularly in nonstraightforward diagnostic tasks. Deliberate reflection has also proved helpful to improve students' diagnostic learning and to facilitate learning of new information. The mechanisms behind the effects of deliberate reflection remain unclear. Tentative explanations focus on the activation/reorganisation of prior knowledge induced by deliberate reflection. Its usefulness depends therefore on the difficulty of the problem relative to the clinician's knowledge. Further research should examine variations in instructions on how to reflect upon a case, the value of further guidance while learning from deliberate reflection, and its benefits in real practice.

Collapse

Staal J, Hooftman J, Gunput STG, Mamede S, Frens MA, Van den Broek WW, Alsma J, Zwaan L. Effect on diagnostic accuracy of cognitive reasoning tools for the workplace setting: systematic review and meta-analysis. BMJ Qual Saf 2022;31:899-910. [DOI: 10.1136/bmjqs-2022-014865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 08/10/2022] [Indexed: 11/04/2022]

Abstract BackgroundPreventable diagnostic errors are a large burden on healthcare. Cognitive reasoning tools, that is, tools that aim to improve clinical reasoning, are commonly suggested interventions. However, quantitative estimates of tool effectiveness have been aggregated over both workplace-oriented and educational-oriented tools, leaving the impact of workplace-oriented cognitive reasoning tools alone unclear. This systematic review and meta-analysis aims to estimate the effect of cognitive reasoning tools on improving diagnostic performance among medical professionals and students, and to identify factors associated with larger improvements.MethodsControlled experimental studies that assessed whether cognitive reasoning tools improved the diagnostic accuracy of individual medical students or professionals in a workplace setting were included. Embase.com, Medline ALL via Ovid, Web of Science Core Collection, Cochrane Central Register of Controlled Trials and Google Scholar were searched from inception to 15 October 2021, supplemented with handsearching. Meta-analysis was performed using a random-effects model.ResultsThe literature search resulted in 4546 articles of which 29 studies with data from 2732 participants were included for meta-analysis. The pooled estimate showed considerable heterogeneity (I2=70%). This was reduced to I2=38% by removing three studies that offered training with the tool before the intervention effect was measured. After removing these studies, the pooled estimate indicated that cognitive reasoning tools led to a small improvement in diagnostic accuracy (Hedges’ g=0.20, 95% CI 0.10 to 0.29, p<0.001). There were no significant subgroup differences.ConclusionCognitive reasoning tools resulted in small but clinically important improvements in diagnostic accuracy in medical students and professionals, although no factors could be distinguished that resulted in larger improvements. Cognitive reasoning tools could be routinely implemented to improve diagnosis in practice, but going forward, more large-scale studies and evaluations of these tools in practice are needed to determine how these tools can be effectively implemented.PROSPERO registration numberCRD42020186994. Collapse

Wright WF, Yenokyan G, Auwaerter PG. Geographic Upon Noninfectious Diseases Accounting for Fever of Unknown Origin (FUO): A Systematic Review and Meta-analysis. Open Forum Infect Dis 2022;9:ofac396. [PMID: 36004312 PMCID: PMC9394765 DOI: 10.1093/ofid/ofac396] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 07/29/2022] [Indexed: 11/14/2022] Open

Abstract Abstract Background Diagnostic outcomes for fever of unknown origin (FUO) remain with notable numbers of undiagnosed cases. A recent systemic review and meta-analysis of studies reported geographic variation in FUO-related infectious diseases. Whether geography influences types of FUO noninfectious diagnoses deserves examination. Methods Medline (PubMed), Embase, Scopus, and Web of Science databases were searched systematically using medical subject headings published from January 1, 1997, to March 31, 2021. Prospective clinical studies investigating participants meeting adult FUO defining criteria were selected if they assessed final diagnoses. Meta-analyses were based on the random-effects model according to World Health Organization (WHO) geographical regions. Results Nineteen studies with significant heterogeneity were analyzed, totaling 2,667 participants. Noninfectious inflammatory disorders had a pooled estimate at 20.0% (95%CI: 17.0-23.0%). Undiagnosed illness had a pooled estimate of 20.0% (95%CI: 14.0-26.0%). The pooled estimate for cancer was 15.0% (95%CI: 12.0-18.0%). Miscellaneous conditions had a pooled estimate of 6.0% (95%CI: 4.0-8.0%). Noninfectious inflammatory disorders and miscellaneous conditions were most prevalent in the Western Pacific region with a 27.0% pooled estimate (95%CI: 20.0-34.0%) and 9.0% (95%CI: 7.0-11.0%), respectively. The highest pooled estimated for cancer was in the Eastern Mediterranean region at 25.0% (95%CI: 18.0-32.0%). Adult-onset Still’s disease (114 [58.5%]), systemic lupus (52 [26.7%]), and giant-cell arteritis (40 [68.9%]) predominated among the noninfectious inflammatory group. Lymphoma (164 [70.1%]) was the most common diagnosis in the cancer group. Conclusions In this systematic review and meta-analysis, noninfectious disease diagnostic outcomes varied among WHO-defined geographies. Evaluation of FUO should consider local variations in disease prevalence. Collapse