1
|
Rutledge GW. Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases. Learn Health Syst 2024; 8:e10438. [PMID: 39036534 PMCID: PMC11257049 DOI: 10.1002/lrh2.10438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/16/2024] [Accepted: 05/19/2024] [Indexed: 07/23/2024] Open
Abstract
Introduction Large language models (LLMs) have a high diagnostic accuracy when they evaluate previously published clinical cases. Methods We compared the accuracy of GPT-4's differential diagnoses for previously unpublished challenging case scenarios with the diagnostic accuracy for previously published cases. Results For a set of previously unpublished challenging clinical cases, GPT-4 achieved 61.1% correct in its top 6 diagnoses versus the previously reported 49.1% for physicians. For a set of 45 clinical vignettes of more common clinical scenarios, GPT-4 included the correct diagnosis in its top 3 diagnoses 100% of the time versus the previously reported 84.3% for physicians. Conclusions GPT-4 performs at a level at least as good as, if not better than, that of experienced physicians on highly challenging cases in internal medicine. The extraordinary performance of GPT-4 on diagnosing common clinical scenarios could be explained in part by the fact that these cases were previously published and may have been included in the training dataset for this LLM.
Collapse
|
2
|
Schmidt HG, Norman GR, Mamede S, Magzoub M. The influence of context on diagnostic reasoning: A narrative synthesis of experimental findings. J Eval Clin Pract 2024. [PMID: 38818694 DOI: 10.1111/jep.14023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 05/03/2024] [Accepted: 05/13/2024] [Indexed: 06/01/2024]
Abstract
AIMS AND OBJECTIVES Contextual information which is implicitly available to physicians during clinical encounters has been shown to influence diagnostic reasoning. To better understand the psychological mechanisms underlying the influence of context on diagnostic accuracy, we conducted a review of experimental research on this topic. METHOD We searched Web of Science, PubMed, and Scopus for relevant articles and looked for additional records by reading the references and approaching experts. We limited the review to true experiments involving physicians in which the outcome variable was the accuracy of the diagnosis. RESULTS The 43 studies reviewed examined two categories of contextual variables: (a) case-intrinsic contextual information and (b) case-extrinsic contextual information. Case-intrinsic information includes implicit misleading diagnostic suggestions in the disease history of the patient, or emotional volatility of the patient. Case-extrinsic or situational information includes a similar (but different) case seen previously, perceived case difficulty, or external digital diagnostic support. Time pressure and interruptions are other extrinsic influences that may affect the accuracy of a diagnosis but have produced conflicting findings. CONCLUSION We propose two tentative hypotheses explaining the role of context in diagnostic accuracy. According to the negative-affect hypothesis, diagnostic errors emerge when the physician's attention shifts from the relevant clinical findings to the (irrelevant) source of negative affect (for instance patient aggression) raised in a clinical encounter. The early-diagnosis-primacy hypothesis attributes errors to the extraordinary influence of the initial hypothesis that comes to the physician's mind on the subsequent collecting and interpretation of case information. Future research should test these mechanisms explicitly. Possible alternative mechanisms such as premature closure or increased production of (irrelevant) rival diagnoses in response to context deserve further scrutiny. Implications for medical education and practice are discussed.
Collapse
Affiliation(s)
- Henk G Schmidt
- Institute of Medical Education Research, Erasmus University Rotterdam, Rotterdam, The Netherlands
| | - Geoffrey R Norman
- Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada
| | - Silvia Mamede
- Institute of Medical Education Research, Erasmus University Rotterdam, Rotterdam, The Netherlands
| | - Mohi Magzoub
- Department of Medical Education, United Arab Emirates University, Al Ain, United Arab Emirates
| |
Collapse
|
3
|
Harada Y, Sakamoto T, Sugimoto S, Shimizu T. Longitudinal Changes in Diagnostic Accuracy of a Differential Diagnosis List Developed by an AI-Based Symptom Checker: Retrospective Observational Study. JMIR Form Res 2024; 8:e53985. [PMID: 38758588 PMCID: PMC11143391 DOI: 10.2196/53985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 03/23/2024] [Accepted: 04/24/2024] [Indexed: 05/18/2024] Open
Abstract
BACKGROUND Artificial intelligence (AI) symptom checker models should be trained using real-world patient data to improve their diagnostic accuracy. Given that AI-based symptom checkers are currently used in clinical practice, their performance should improve over time. However, longitudinal evaluations of the diagnostic accuracy of these symptom checkers are limited. OBJECTIVE This study aimed to assess the longitudinal changes in the accuracy of differential diagnosis lists created by an AI-based symptom checker used in the real world. METHODS This was a single-center, retrospective, observational study. Patients who visited an outpatient clinic without an appointment between May 1, 2019, and April 30, 2022, and who were admitted to a community hospital in Japan within 30 days of their index visit were considered eligible. We only included patients who underwent an AI-based symptom checkup at the index visit, and the diagnosis was finally confirmed during follow-up. Final diagnoses were categorized as common or uncommon, and all cases were categorized as typical or atypical. The primary outcome measure was the accuracy of the differential diagnosis list created by the AI-based symptom checker, defined as the final diagnosis in a list of 10 differential diagnoses created by the symptom checker. To assess the change in the symptom checker's diagnostic accuracy over 3 years, we used a chi-square test to compare the primary outcome over 3 periods: from May 1, 2019, to April 30, 2020 (first year); from May 1, 2020, to April 30, 2021 (second year); and from May 1, 2021, to April 30, 2022 (third year). RESULTS A total of 381 patients were included. Common diseases comprised 257 (67.5%) cases, and typical presentations were observed in 298 (78.2%) cases. Overall, the accuracy of the differential diagnosis list created by the AI-based symptom checker was 172 (45.1%), which did not differ across the 3 years (first year: 97/219, 44.3%; second year: 32/72, 44.4%; and third year: 43/90, 47.7%; P=.85). The accuracy of the differential diagnosis list created by the symptom checker was low in those with uncommon diseases (30/124, 24.2%) and atypical presentations (12/83, 14.5%). In the multivariate logistic regression model, common disease (P<.001; odds ratio 4.13, 95% CI 2.50-6.98) and typical presentation (P<.001; odds ratio 6.92, 95% CI 3.62-14.2) were significantly associated with the accuracy of the differential diagnosis list created by the symptom checker. CONCLUSIONS A 3-year longitudinal survey of the diagnostic accuracy of differential diagnosis lists developed by an AI-based symptom checker, which has been implemented in real-world clinical practice settings, showed no improvement over time. Uncommon diseases and atypical presentations were independently associated with a lower diagnostic accuracy. In the future, symptom checkers should be trained to recognize uncommon conditions.
Collapse
Affiliation(s)
- Yukinori Harada
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga, Japan
- Department of General Medicine, Nagano Chuo Hospital, Nagano, Japan
| | - Tetsu Sakamoto
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga, Japan
| | - Shu Sugimoto
- Department of Medicine (Neurology and Rheumatology), Shinshu University School of Medicine, Matsumoto, Japan
| | - Taro Shimizu
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Shimotsuga, Japan
| |
Collapse
|
4
|
Bridges JM. Computerized diagnostic decision support systems - a comparative performance study of Isabel Pro vs. ChatGPT4. Diagnosis (Berl) 2024; 0:dx-2024-0033. [PMID: 38709491 DOI: 10.1515/dx-2024-0033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 04/22/2024] [Indexed: 05/07/2024]
Abstract
OBJECTIVES Validate the diagnostic accuracy of the Artificial Intelligence Large Language Model ChatGPT4 by comparing diagnosis lists produced by ChatGPT4 to Isabel Pro. METHODS This study used 201 cases, comparing ChatGPT4 to Isabel Pro. Systems inputs were identical. Mean Reciprocal Rank (MRR) compares the correct diagnosis's rank between systems. Isabel Pro ranks by the frequency with which the symptoms appear in the reference dataset. The mechanism ChatGPT4 uses to rank the diagnoses is unknown. A Wilcoxon Signed Rank Sum test failed to reject the null hypothesis. RESULTS Both systems produced comprehensive differential diagnosis lists. Isabel Pro's list appears immediately upon submission, while ChatGPT4 takes several minutes. Isabel Pro produced 175 (87.1 %) correct diagnoses and ChatGPT4 165 (82.1 %). The MRR for ChatGPT4 was 0.428 (rank 2.31), and Isabel Pro was 0.389 (rank 2.57), an average rank of three for each. ChatGPT4 outperformed on Recall at Rank 1, 5, and 10, with Isabel Pro outperforming at 20, 30, and 40. The Wilcoxon Signed Rank Sum Test confirmed that the sample size was inadequate to conclude that the systems are equivalent. ChatGPT4 fabricated citations and DOIs, producing 145 correct references (87.9 %) but only 52 correct DOIs (31.5 %). CONCLUSIONS This study validates the promise of Clinical Diagnostic Decision Support Systems, including the Large Language Model form of artificial intelligence (AI). Until the issue of hallucination of references and, perhaps diagnoses, is resolved in favor of absolute accuracy, clinicians will make cautious use of Large Language Model systems in diagnosis, if at all.
Collapse
Affiliation(s)
- Joe M Bridges
- D. Bradley McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, USA
| |
Collapse
|
5
|
Michelson KA, Rees CA, Florin TA, Bachur RG. Emergency Department Volume and Delayed Diagnosis of Serious Pediatric Conditions. JAMA Pediatr 2024; 178:362-368. [PMID: 38345811 PMCID: PMC10862268 DOI: 10.1001/jamapediatrics.2023.6672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 12/14/2023] [Indexed: 02/15/2024]
Abstract
Importance Diagnostic delays are common in the emergency department (ED) and may predispose to worse outcomes. Objective To evaluate the association of annual pediatric volume in the ED with delayed diagnosis. Design, Setting, and Participants This retrospective cohort study included all children younger than 18 years treated at 954 EDs in 8 states with a first-time diagnosis of any of 23 acute, serious conditions: bacterial meningitis, compartment syndrome, complicated pneumonia, craniospinal abscess, deep neck infection, ectopic pregnancy, encephalitis, intussusception, Kawasaki disease, mastoiditis, myocarditis, necrotizing fasciitis, nontraumatic intracranial hemorrhage, orbital cellulitis, osteomyelitis, ovarian torsion, pulmonary embolism, pyloric stenosis, septic arthritis, sinus venous thrombosis, slipped capital femoral epiphysis, stroke, or testicular torsion. Patients were identified using the Healthcare Cost and Utilization Project State ED and Inpatient Databases. Data were collected from January 2015 to December 2019, and data were analyzed from July to December 2023. Exposure Annual volume of children at the first ED visited. Main Outcomes and Measures Possible delayed diagnosis, defined as a patient with an ED discharge within 7 days prior to diagnosis. A secondary outcome was condition-specific complications. Rates of possible delayed diagnosis and complications were determined. The association of volume with delayed diagnosis across conditions was evaluated using conditional logistic regression matching on condition, age, and medical complexity. Condition-specific volume-delay associations were tested using hierarchical logistic models with log volume as the exposure, adjusting for age, sex, payer, medical complexity, and hospital urbanicity. The association of delayed diagnosis with complications by condition was then examined using logistic regressions. Results Of 58 998 included children, 37 211 (63.1%) were male, and the mean (SD) age was 7.1 (5.8) years. A total of 6709 (11.4%) had a complex chronic condition. Delayed diagnosis occurred in 9296 (15.8%; 95% CI, 15.5-16.1). Each 2-fold increase in annual pediatric volume was associated with a 26.7% (95% CI, 22.5-30.7) decrease in possible delayed diagnosis. For 21 of 23 conditions (all except ectopic pregnancy and sinus venous thrombosis), there were decreased rates of possible delayed diagnosis with increasing ED volume. Condition-specific complications were 11.2% (95% CI, 3.1-20.0) more likely among patients with a possible delayed diagnosis compared with those without. Conclusions and Relevance EDs with fewer pediatric encounters had more possible delayed diagnoses across 23 serious conditions. Tools to support timely diagnosis in low-volume EDs are needed.
Collapse
Affiliation(s)
- Kenneth A. Michelson
- Division of Emergency Medicine, Department of Pediatrics, Ann & Robert Lurie Children’s Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Chris A. Rees
- Division of Pediatric Emergency Medicine, Emory University School of Medicine, Atlanta, Georgia
| | - Todd A. Florin
- Division of Emergency Medicine, Department of Pediatrics, Ann & Robert Lurie Children’s Hospital, Northwestern University Feinberg School of Medicine, Chicago, Illinois
| | - Richard G. Bachur
- Division of Emergency Medicine, Boston Children’s Hospital, Boston, Massachusetts
| |
Collapse
|
6
|
Zampatti S, Peconi C, Megalizzi D, Calvino G, Trastulli G, Cascella R, Strafella C, Caltagirone C, Giardina E. Innovations in Medicine: Exploring ChatGPT's Impact on Rare Disorder Management. Genes (Basel) 2024; 15:421. [PMID: 38674356 PMCID: PMC11050022 DOI: 10.3390/genes15040421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
Artificial intelligence (AI) is rapidly transforming the field of medicine, announcing a new era of innovation and efficiency. Among AI programs designed for general use, ChatGPT holds a prominent position, using an innovative language model developed by OpenAI. Thanks to the use of deep learning techniques, ChatGPT stands out as an exceptionally viable tool, renowned for generating human-like responses to queries. Various medical specialties, including rheumatology, oncology, psychiatry, internal medicine, and ophthalmology, have been explored for ChatGPT integration, with pilot studies and trials revealing each field's potential benefits and challenges. However, the field of genetics and genetic counseling, as well as that of rare disorders, represents an area suitable for exploration, with its complex datasets and the need for personalized patient care. In this review, we synthesize the wide range of potential applications for ChatGPT in the medical field, highlighting its benefits and limitations. We pay special attention to rare and genetic disorders, aiming to shed light on the future roles of AI-driven chatbots in healthcare. Our goal is to pave the way for a healthcare system that is more knowledgeable, efficient, and centered around patient needs.
Collapse
Affiliation(s)
- Stefania Zampatti
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Cristina Peconi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Domenica Megalizzi
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Giulia Calvino
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Science, Roma Tre University, 00146 Rome, Italy
| | - Giulia Trastulli
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of System Medicine, Tor Vergata University, 00133 Rome, Italy
| | - Raffaella Cascella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Chemical-Toxicological and Pharmacological Evaluation of Drugs, Catholic University Our Lady of Good Counsel, 1000 Tirana, Albania
| | - Claudia Strafella
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
| | - Carlo Caltagirone
- Department of Clinical and Behavioral Neurology, IRCCS Fondazione Santa Lucia, 00179 Rome, Italy;
| | - Emiliano Giardina
- Genomic Medicine Laboratory UILDM, IRCCS Santa Lucia Foundation, 00179 Rome, Italy; (S.Z.)
- Department of Biomedicine and Prevention, Tor Vergata University, 00133 Rome, Italy
| |
Collapse
|
7
|
Hu Z, Wang M, Zheng S, Xu X, Zhang Z, Ge Q, Li J, Yao Y. Clinical Decision Support Requirements for Ventricular Tachycardia Diagnosis Within the Frameworks of Knowledge and Practice: Survey Study. JMIR Hum Factors 2024; 11:e55802. [PMID: 38530337 PMCID: PMC11005434 DOI: 10.2196/55802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 02/15/2024] [Accepted: 03/02/2024] [Indexed: 03/27/2024] Open
Abstract
BACKGROUND Ventricular tachycardia (VT) diagnosis is challenging due to the similarity between VT and some forms of supraventricular tachycardia, complexity of clinical manifestations, heterogeneity of underlying diseases, and potential for life-threatening hemodynamic instability. Clinical decision support systems (CDSSs) have emerged as promising tools to augment the diagnostic capabilities of cardiologists. However, a requirements analysis is acknowledged to be vital for the success of a CDSS, especially for complex clinical tasks such as VT diagnosis. OBJECTIVE The aims of this study were to analyze the requirements for a VT diagnosis CDSS within the frameworks of knowledge and practice and to determine the clinical decision support (CDS) needs. METHODS Our multidisciplinary team first conducted semistructured interviews with seven cardiologists related to the clinical challenges of VT and expected decision support. A questionnaire was designed by the multidisciplinary team based on the results of interviews. The questionnaire was divided into four sections: demographic information, knowledge assessment, practice assessment, and CDS needs. The practice section consisted of two simulated cases for a total score of 10 marks. Online questionnaires were disseminated to registered cardiologists across China from December 2022 to February 2023. The scores for the practice section were summarized as continuous variables, using the mean, median, and range. The knowledge and CDS needs sections were assessed using a 4-point Likert scale without a neutral option. Kruskal-Wallis tests were performed to investigate the relationship between scores and practice years or specialty. RESULTS Of the 687 cardiologists who completed the questionnaire, 567 responses were eligible for further analysis. The results of the knowledge assessment showed that 383 cardiologists (68%) lacked knowledge in diagnostic evaluation. The overall average score of the practice assessment was 6.11 (SD 0.55); the etiological diagnosis section had the highest overall scores (mean 6.74, SD 1.75), whereas the diagnostic evaluation section had the lowest scores (mean 5.78, SD 1.19). A majority of cardiologists (344/567, 60.7%) reported the need for a CDSS. There was a significant difference in practice competency scores between general cardiologists and arrhythmia specialists (P=.02). CONCLUSIONS There was a notable deficiency in the knowledge and practice of VT among Chinese cardiologists. Specific knowledge and practice support requirements were identified, which provide a foundation for further development and optimization of a CDSS. Moreover, it is important to consider clinicians' specialization levels and years of practice for effective and personalized support.
Collapse
Affiliation(s)
- Zhao Hu
- Arrhythmia Center, Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College/National Center for Cardiovascular Diseases, Beijing, China
| | - Min Wang
- Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Si Zheng
- Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xiaowei Xu
- Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Zhuxin Zhang
- Arrhythmia Center, Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College/National Center for Cardiovascular Diseases, Beijing, China
| | - Qiaoyue Ge
- West China School of Public Health, West China Fourth Hospital, Sichuan University, Chengdu, China
| | - Jiao Li
- Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Yan Yao
- Arrhythmia Center, Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College/National Center for Cardiovascular Diseases, Beijing, China
| |
Collapse
|
8
|
Sibbald M, Zwaan L, Yilmaz Y, Lal S. Incorporating artificial intelligence in medical diagnosis: A case for an invisible and (un)disruptive approach. J Eval Clin Pract 2024; 30:3-8. [PMID: 35761764 DOI: 10.1111/jep.13730] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 06/13/2022] [Indexed: 12/30/2022]
Abstract
As big data becomes more publicly accessible, artificial intelligence (AI) is increasingly available and applicable to problems around clinical decision-making. Yet the adoption of AI technology in healthcare lags well behind other industries. The gap between what technology could do, and what technology is actually being used for is rapidly widening. While many solutions are proposed to address this gap, clinician resistance to the adoption of AI remains high. To aid with change, we propose facilitating clinician decisions through technology by seamlessly weaving what we call 'invisible AI' into existing clinician workflows, rather than sequencing new steps into clinical processes. We explore evidence from the change management and human factors literature to conceptualize a new approach to AI implementation in health organizations. We discuss challenges and provide recommendations for organizations to employ this strategy.
Collapse
Affiliation(s)
- Matt Sibbald
- Department of Medicine, McMaster Education Research Innovation and Theory (MERIT) Program, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Laura Zwaan
- Erasmus Medical Center, Institute of Medical Education Research Rotterdam (iMERR), Rotterdam, The Netherlands
| | - Yusuf Yilmaz
- McMaster Education Research Innovation and Theory (MERIT) Program, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Continuing Professional Development Office, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Department of Medical Education, Faculty of Medicine, Ege University, Izmir, Turkey
| | - Sarrah Lal
- Department of Medicine, Division of Innovation and Education, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
9
|
Pacheco K, Ji J, Barbosa K, Lemay K, Fortier JH, Garber GE. Medico-legal risk of infectious disease physicians in Canada: A retrospective review. JOURNAL OF THE ASSOCIATION OF MEDICAL MICROBIOLOGY AND INFECTIOUS DISEASE CANADA = JOURNAL OFFICIEL DE L'ASSOCIATION POUR LA MICROBIOLOGIE MEDICALE ET L'INFECTIOLOGIE CANADA 2024; 8:319-327. [PMID: 38250623 PMCID: PMC10797760 DOI: 10.3138/jammi-2023-0022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/15/2023] [Accepted: 09/18/2023] [Indexed: 01/23/2024]
Abstract
Objective There is little known about the medico-legal risk for infectious disease specialists in Canada. The objective of this study was to identify the causes of these medico-legal risks with the goal of improving patient safety and outcomes. Methods A 10-year retrospective analysis of Canadian Medical Protective Association (CMPA) closed medico-legal cases from 2012 to 2021 was performed. Peer expert criticism was used to identify factors that contributed to the medico-legal cases at the provider, team, or system level, and were contrasted with the patient complaint. Results During the study period there were 571 infectious disease physician members of the CMPA. There were 96 patient medico-legal cases: 45 College complaints, 40 civil legal matters, and 11 hospital complaints. Ten cases were associated with severe patient harm or death. Patients were most likely to complain about perceived deficient assessments (54%), diagnostic errors (53%), inadequate monitoring or follow-up (20%), and unprofessional manner (20%). In contrast, peer experts were most critical of the areas of diagnostic assessment (20%), deficient assessment (10%), failure to perform test/intervention (8%), and failure to refer (6%). Conclusion While infectious disease physicians tend to have lower medico-legal risks compared to other health care providers, these risks still do exist. This descriptive study provides insights into the types of cases, presenting conditions, and patient allegations associated with their practice.
Collapse
Affiliation(s)
- Karen Pacheco
- Department of Safe Medical Care, Canadian Medical Protective Association, Ottawa, Ontario, Canada
| | - Jun Ji
- Department of Safe Medical Care, Canadian Medical Protective Association, Ottawa, Ontario, Canada
| | - Kate Barbosa
- Department of Safe Medical Care, Canadian Medical Protective Association, Ottawa, Ontario, Canada
| | - Karen Lemay
- Department of Safe Medical Care, Canadian Medical Protective Association, Ottawa, Ontario, Canada
| | - Jacqueline H Fortier
- Department of Safe Medical Care, Canadian Medical Protective Association, Ottawa, Ontario, Canada
| | - Gary E Garber
- Department of Safe Medical Care, Canadian Medical Protective Association, Ottawa, Ontario, Canada
- Faculty of Medicine, Department of Medicine and the School of Public Health and Epidemiology, University of Ottawa, Ottawa, Ontario, Canada
- Ottawa Hospital Research Institute, Clinical Epidemiology Program, Ottawa, Ontario, Canada
| |
Collapse
|
10
|
Ito N, Kadomatsu S, Fujisawa M, Fukaguchi K, Ishizawa R, Kanda N, Kasugai D, Nakajima M, Goto T, Tsugawa Y. The Accuracy and Potential Racial and Ethnic Biases of GPT-4 in the Diagnosis and Triage of Health Conditions: Evaluation Study. JMIR MEDICAL EDUCATION 2023; 9:e47532. [PMID: 37917120 PMCID: PMC10654908 DOI: 10.2196/47532] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 07/07/2023] [Accepted: 09/05/2023] [Indexed: 11/03/2023]
Abstract
BACKGROUND Whether GPT-4, the conversational artificial intelligence, can accurately diagnose and triage health conditions and whether it presents racial and ethnic biases in its decisions remain unclear. OBJECTIVE We aim to assess the accuracy of GPT-4 in the diagnosis and triage of health conditions and whether its performance varies by patient race and ethnicity. METHODS We compared the performance of GPT-4 and physicians, using 45 typical clinical vignettes, each with a correct diagnosis and triage level, in February and March 2023. For each of the 45 clinical vignettes, GPT-4 and 3 board-certified physicians provided the most likely primary diagnosis and triage level (emergency, nonemergency, or self-care). Independent reviewers evaluated the diagnoses as "correct" or "incorrect." Physician diagnosis was defined as the consensus of the 3 physicians. We evaluated whether the performance of GPT-4 varies by patient race and ethnicity, by adding the information on patient race and ethnicity to the clinical vignettes. RESULTS The accuracy of diagnosis was comparable between GPT-4 and physicians (the percentage of correct diagnosis was 97.8% (44/45; 95% CI 88.2%-99.9%) for GPT-4 and 91.1% (41/45; 95% CI 78.8%-97.5%) for physicians; P=.38). GPT-4 provided appropriate reasoning for 97.8% (44/45) of the vignettes. The appropriateness of triage was comparable between GPT-4 and physicians (GPT-4: 30/45, 66.7%; 95% CI 51.0%-80.0%; physicians: 30/45, 66.7%; 95% CI 51.0%-80.0%; P=.99). The performance of GPT-4 in diagnosing health conditions did not vary among different races and ethnicities (Black, White, Asian, and Hispanic), with an accuracy of 100% (95% CI 78.2%-100%). P values, compared to the GPT-4 output without incorporating race and ethnicity information, were all .99. The accuracy of triage was not significantly different even if patients' race and ethnicity information was added. The accuracy of triage was 62.2% (95% CI 46.5%-76.2%; P=.50) for Black patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for White patients; 66.7% (95% CI 51.0%-80.0%; P=.99) for Asian patients, and 62.2% (95% CI 46.5%-76.2%; P=.69) for Hispanic patients. P values were calculated by comparing the outputs with and without conditioning on race and ethnicity. CONCLUSIONS GPT-4's ability to diagnose and triage typical clinical vignettes was comparable to that of board-certified physicians. The performance of GPT-4 did not vary by patient race and ethnicity. These findings should be informative for health systems looking to introduce conversational artificial intelligence to improve the efficiency of patient diagnosis and triage.
Collapse
Affiliation(s)
- Naoki Ito
- TXP Medical Co Ltd, Tokyo, Japan
- Faculty of Medicine, The University of Tokyo, Tokyo, Japan
| | - Sakina Kadomatsu
- TXP Medical Co Ltd, Tokyo, Japan
- Faculty of Medicine, International University of Health and Welfare, Chiba, Japan
| | - Mineto Fujisawa
- TXP Medical Co Ltd, Tokyo, Japan
- Faculty of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kiyomitsu Fukaguchi
- TXP Medical Co Ltd, Tokyo, Japan
- Department of Emergency Medicine, Shonan Kamakura General Hospital, Kanagawa, Japan
| | - Ryo Ishizawa
- TXP Medical Co Ltd, Tokyo, Japan
- Department of Emergency and Critical Care Medicine, Tokyo Medical Center National Hospital Organization, Tokyo, Japan
| | - Naoki Kanda
- TXP Medical Co Ltd, Tokyo, Japan
- Division of General Internal Medicine, Jichi Medical University Hospital, Tochigi, Japan
| | - Daisuke Kasugai
- TXP Medical Co Ltd, Tokyo, Japan
- Department of Emergency and Critical Care Medicine, Nagoya University Graduate School of Medicine, Aichi, Japan
| | - Mikio Nakajima
- TXP Medical Co Ltd, Tokyo, Japan
- Emergency Life-Saving Technique Academy of Tokyo Foundation for Ambulance Service Development, Tokyo, Japan
| | | | - Yusuke Tsugawa
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, The University of California, Los Angeles, Los Angeles, CA, United States
- Department of Health Policy and Management, UCLA Fielding School of Public Health, Los Angeles, CA, United States
| |
Collapse
|
11
|
Ing EB, Balas M, Nassrallah G, DeAngelis D, Nijhawan N. The Isabel Differential Diagnosis Generator for Orbital Diagnosis. Ophthalmic Plast Reconstr Surg 2023; 39:461-464. [PMID: 36928323 DOI: 10.1097/iop.0000000000002364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
PURPOSE The Isabel differential diagnosis generator is one of the most widely known electronic diagnosis decision support tools. The authors prospectively evaluated the utility of Isabel for orbital disease differential diagnosis. METHODS The terms "proptosis," "lid retraction," "orbit inflammation," "orbit tumour," "orbit tumor, infiltrative" and "orbital tumor, well-circumscribed" were separately input into Isabel and the results were tabulated. Then the clinical details (patient age, gender, signs, symptoms, and imaging findings) of 25 orbital cases from a textbook of orbital surgery were entered into Isabel. The top 10 differential diagnoses generated by Isabel were compared with the correct diagnosis. RESULTS Isabel identified hyperthyroidism and Graves ophthalmopathy as the leading causes of lid retraction, but many common causes of proptosis and orbital tumors were not correctly elucidated. Of the textbook cases, Isabel correctly identified 4/25 (16%) of orbital cases as one of its top 10 differential diagnoses, and the median rank of the correct diagnosis was 6/10. Thirty-two percent of the output diagnoses were unlikely to cause orbital disease. CONCLUSION Isabel is currently of limited value in the mainstream orbital differential diagnosis. The incorporation of anatomic localizations and imaging findings may help increase the accuracy of orbital diagnosis.
Collapse
Affiliation(s)
- Edsel B Ing
- Department of Ophthalmology and Vision Science, University of Toronto Temerty Faculty of Medicine, Toronto, Canada
- Department of Ophthalmolgoy and Vision Science, University of Alberta, Edmonton, Canada
| | - Michael Balas
- Department of Ophthalmolgoy and Vision Science, University of Alberta, Edmonton, Canada
| | - Georges Nassrallah
- Department of Ophthalmology and Vision Science, University of Toronto Temerty Faculty of Medicine, Toronto, Canada
| | - Dan DeAngelis
- Department of Ophthalmology and Vision Science, University of Toronto Temerty Faculty of Medicine, Toronto, Canada
| | - Navdeep Nijhawan
- Department of Ophthalmology and Vision Science, University of Toronto Temerty Faculty of Medicine, Toronto, Canada
| |
Collapse
|
12
|
Harada Y, Tomiyama S, Sakamoto T, Sugimoto S, Kawamura R, Yokose M, Hayashi A, Shimizu T. Effects of Combinational Use of Additional Differential Diagnostic Generators on the Diagnostic Accuracy of the Differential Diagnosis List Developed by an Artificial Intelligence-Driven Automated History-Taking System: Pilot Cross-Sectional Study. JMIR Form Res 2023; 7:e49034. [PMID: 37531164 PMCID: PMC10433017 DOI: 10.2196/49034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/23/2023] [Accepted: 07/19/2023] [Indexed: 08/03/2023] Open
Abstract
BACKGROUND Low diagnostic accuracy is a major concern in automated medical history-taking systems with differential diagnosis (DDx) generators. Extending the concept of collective intelligence to the field of DDx generators such that the accuracy of judgment becomes higher when accepting an integrated diagnosis list from multiple people than when accepting a diagnosis list from a single person may be a possible solution. OBJECTIVE The purpose of this study is to assess whether the combined use of several DDx generators improves the diagnostic accuracy of DDx lists. METHODS We used medical history data and the top 10 DDx lists (index DDx lists) generated by an artificial intelligence (AI)-driven automated medical history-taking system from 103 patients with confirmed diagnoses. Two research physicians independently created the other top 10 DDx lists (second and third DDx lists) per case by imputing key information into the other 2 DDx generators based on the medical history generated by the automated medical history-taking system without reading the index lists generated by the automated medical history-taking system. We used the McNemar test to assess the improvement in diagnostic accuracy from the index DDx lists to the three types of combined DDx lists: (1) simply combining DDx lists from the index, second, and third lists; (2) creating a new top 10 DDx list using a 1/n weighting rule; and (3) creating new lists with only shared diagnoses among DDx lists from the index, second, and third lists. We treated the data generated by 2 research physicians from the same patient as independent cases. Therefore, the number of cases included in analyses in the case using 2 additional lists was 206 (103 cases × 2 physicians' input). RESULTS The diagnostic accuracy of the index lists was 46% (47/103). Diagnostic accuracy was improved by simply combining the other 2 DDx lists (133/206, 65%, P<.001), whereas the other 2 combined DDx lists did not improve the diagnostic accuracy of the DDx lists (106/206, 52%, P=.05 in the collective list with the 1/n weighting rule and 29/206, 14%, P<.001 in the only shared diagnoses among the 3 DDx lists). CONCLUSIONS Simply adding each of the top 10 DDx lists from additional DDx generators increased the diagnostic accuracy of the DDx list by approximately 20%, suggesting that the combinational use of DDx generators early in the diagnostic process is beneficial.
Collapse
Affiliation(s)
- Yukinori Harada
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Shimotsugagun, Japan
- Department of Internal Medicine, Nagano Chuo Hospital, Nagano, Japan
| | - Shusaku Tomiyama
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Shimotsugagun, Japan
| | - Tetsu Sakamoto
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Shimotsugagun, Japan
| | - Shu Sugimoto
- Department of Internal Medicine, Nagano Chuo Hospital, Nagano, Japan
| | - Ren Kawamura
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Shimotsugagun, Japan
| | - Masashi Yokose
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Shimotsugagun, Japan
| | - Arisa Hayashi
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Shimotsugagun, Japan
| | - Taro Shimizu
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Shimotsugagun, Japan
| |
Collapse
|
13
|
Yanagita Y, Shikino K, Ishizuka K, Uchida S, Li Y, Yokokawa D, Tsukamoto T, Noda K, Uehara T, Ikusaka M. Improving decision accuracy using a clinical decision support system for medical students during history-taking: a randomized clinical trial. BMC MEDICAL EDUCATION 2023; 23:383. [PMID: 37231512 PMCID: PMC10214648 DOI: 10.1186/s12909-023-04370-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 05/17/2023] [Indexed: 05/27/2023]
Abstract
BACKGROUND A clinical diagnostic support system (CDSS) can support medical students and physicians in providing evidence-based care. In this study, we investigate diagnostic accuracy based on the history of present illness between groups of medical students using a CDSS, Google, and neither (control). Further, the degree of diagnostic accuracy of medical students using a CDSS is compared with that of residents using neither a CDSS nor Google. METHODS This study is a randomized educational trial. The participants comprised 64 medical students and 13 residents who rotated in the Department of General Medicine at Chiba University Hospital from May to December 2020. The medical students were randomly divided into the CDSS group (n = 22), Google group (n = 22), and control group (n = 20). Participants were asked to provide the three most likely diagnoses for 20 cases, mainly a history of a present illness (10 common and 10 emergent diseases). Each correct diagnosis was awarded 1 point (maximum 20 points). The mean scores of the three medical student groups were compared using a one-way analysis of variance. Furthermore, the mean scores of the CDSS, Google, and residents' (without CDSS or Google) groups were compared. RESULTS The mean scores of the CDSS (12.0 ± 1.3) and Google (11.9 ± 1.1) groups were significantly higher than those of the control group (9.5 ± 1.7; p = 0.02 and p = 0.03, respectively). The residents' group's mean score (14.7 ± 1.4) was higher than the mean scores of the CDSS and Google groups (p = 0.01). Regarding common disease cases, the mean scores were 7.4 ± 0.7, 7.1 ± 0.7, and 8.2 ± 0.7 for the CDSS, Google, and residents' groups, respectively. There were no significant differences in mean scores (p = 0.1). CONCLUSIONS Medical students who used the CDSS and Google were able to list differential diagnoses more accurately than those using neither. Furthermore, they could make the same level of differential diagnoses as residents in the context of common diseases. TRIAL REGISTRATION This study was retrospectively registered with the University Hospital Medical Information Network Clinical Trials Registry on 24/12/2020 (unique trial number: UMIN000042831).
Collapse
Affiliation(s)
- Yasutaka Yanagita
- Department of General Medicine, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-City, Chiba Pref, Japan.
| | - Kiyoshi Shikino
- Department of General Medicine, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-City, Chiba Pref, Japan
| | - Kosuke Ishizuka
- Department of General Medicine, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-City, Chiba Pref, Japan
| | - Shun Uchida
- Department of General Medicine, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-City, Chiba Pref, Japan
| | - Yu Li
- Department of General Medicine, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-City, Chiba Pref, Japan
| | - Daiki Yokokawa
- Department of General Medicine, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-City, Chiba Pref, Japan
| | - Tomoko Tsukamoto
- Department of General Medicine, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-City, Chiba Pref, Japan
| | - Kazutaka Noda
- Department of General Medicine, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-City, Chiba Pref, Japan
| | - Takanori Uehara
- Department of General Medicine, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-City, Chiba Pref, Japan
| | - Masatomi Ikusaka
- Department of General Medicine, Chiba University Hospital, 1-8-1, Inohana, Chuo-Ku, Chiba-City, Chiba Pref, Japan
| |
Collapse
|
14
|
Diagnostic Delays in Sepsis: Lessons Learned From a Retrospective Study of Canadian Medico-Legal Claims. Crit Care Explor 2023; 5:e0841. [PMID: 36751515 PMCID: PMC9894347 DOI: 10.1097/cce.0000000000000841] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Although rapid treatment improves outcomes for patients presenting with sepsis, early detection can be difficult, especially in otherwise healthy adults. OBJECTIVES Using medico-legal data, we aimed to identify areas of focus to assist with early recognition of sepsis. DESIGN SETTING AND PARTICIPANTS Retrospective descriptive design. We analyzed closed medico-legal cases involving physicians from a national database repository at the Canadian Medical Protective Association. The study included cases closed between 2011 and 2020 that had documented peer expert criticism of a diagnostic issue related to sepsis or relevant infections. MAIN OUTCOMES AND MEASURES We used univariate statistics to describe patients and physicians and applied published frameworks to classify contributing factors (provider, team, system) and diagnostic pitfalls based on peer expert criticisms. RESULTS Of 162 involved patients, the median age was 53 years (interquartile range [IQR], 34-66 yr) and mortality was 49%. Of 218 implicated physicians, 169 (78%) were from family medicine, emergency medicine, or surgical specialties. Eighty patients (49%) made multiple visits to outpatient care leading up to sepsis recognition/hospitalization (median = two visits; IQR, 2-4). Almost 40% of patients were admitted to the ICU. Deficient assessments, such as failing to consider sepsis or not reassessing the patient prior to discharge, contributed to the majority of cases (81%). CONCLUSIONS AND RELEVANCE Sepsis continues to be a challenging diagnosis for clinicians. Multiple visits to outpatient care may be an early warning sign requiring vigilance in the patient assessment.
Collapse
|
15
|
Schmidt HG, Mamede S. Improving diagnostic decision support through deliberate reflection: a proposal. Diagnosis (Berl) 2023; 10:38-42. [PMID: 36000188 DOI: 10.1515/dx-2022-0062] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 07/25/2022] [Indexed: 11/15/2022]
Abstract
Digital decision support (DDS) is expected to play an important role in improving a physician's diagnostic performance and reducing the burden of diagnostic error. Studies with currently available DDS systems indicate that they lead to modest gains in diagnostic accuracy, and these systems are expected to evolve to become more effective and user-friendly in the future. In this position paper, we propose that a way towards this future is to rethink DDS systems based on deliberate reflection, a strategy by which physicians systematically review the clinical findings observed in a patient in the light of an initial diagnosis. Deliberate reflection has been demonstrated to improve diagnostic accuracy in several contexts. In this paper, we first describe the deliberate reflection strategy, including the crucial element that would make it useful in the interaction with a DDS system. We examine the nature of conventional DDS systems and their shortcomings. Finally, we propose what DDS based on deliberate reflection might look like, and consider why it would overcome downsides of conventional DDS.
Collapse
Affiliation(s)
- Henk G Schmidt
- Department of Psychology, Education and Child Studies, Erasmus University Rotterdam, Rotterdam, The Netherlands.,Institute of Medical Education Research Rotterdam, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Sílvia Mamede
- Department of Psychology, Education and Child Studies, Erasmus University Rotterdam, Rotterdam, The Netherlands.,Institute of Medical Education Research Rotterdam, Erasmus Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
16
|
Kourtidis P, Nurek M, Delaney B, Kostopoulou O. Influences of early diagnostic suggestions on clinical reasoning. Cogn Res Princ Implic 2022; 7:103. [PMID: 36520258 PMCID: PMC9755454 DOI: 10.1186/s41235-022-00453-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 12/02/2022] [Indexed: 12/23/2022] Open
Abstract
Previous research has highlighted the importance of physicians' early hypotheses for their subsequent diagnostic decisions. It has also been shown that diagnostic accuracy improves when physicians are presented with a list of diagnostic suggestions to consider at the start of the clinical encounter. The psychological mechanisms underlying this improvement in accuracy are hypothesised. It is possible that the provision of diagnostic suggestions disrupts physicians' intuitive thinking and reduces their certainty in their initial diagnostic hypotheses. This may encourage them to seek more information before reaching a diagnostic conclusion, evaluate this information more objectively, and be more open to changing their initial hypotheses. Three online experiments explored the effects of early diagnostic suggestions, provided by a hypothetical decision aid, on different aspects of the diagnostic reasoning process. Family physicians assessed up to two patient scenarios with and without suggestions. We measured effects on certainty about the initial diagnosis, information search and evaluation, and frequency of diagnostic changes. We did not find a clear and consistent effect of suggestions and detected mainly non-significant trends, some in the expected direction. We also detected a potential biasing effect: when the most likely diagnosis was included in the list of suggestions (vs. not included), physicians who gave that diagnosis initially, tended to request less information, evaluate it as more supportive of their diagnosis, become more certain about it, and change it less frequently when encountering new but ambiguous information; in other words, they seemed to validate rather than question their initial hypothesis. We conclude that further research using different methodologies and more realistic experimental situations is required to uncover both the beneficial and biasing effects of early diagnostic suggestions.
Collapse
|
17
|
Scott IA. Using information technology to reduce diagnostic error: still a bridge too far? Intern Med J 2022; 52:908-911. [PMID: 35718736 DOI: 10.1111/imj.15804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 04/28/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Ian A Scott
- Internal Medicine and Clinical Epidemiology, Princess Alexandra Hospital, Brisbane, Queensland, Australia.,School of Clinical Medicine, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
18
|
Sibbald M, Abdulla B, Keuhl A, Norman G, Monteiro S, Sherbino J. Electronic diagnostic support in emergency physician triage: a qualitative study (Preprint). JMIR Hum Factors 2022; 9:e39234. [PMID: 36178728 PMCID: PMC9568817 DOI: 10.2196/39234] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 08/05/2022] [Accepted: 08/29/2022] [Indexed: 12/05/2022] Open
Abstract
Background Not thinking of a diagnosis is a leading cause of diagnostic error in the emergency department, resulting in delayed treatment, morbidity, and excess mortality. Electronic differential diagnostic support (EDS) results in small but significant reductions in diagnostic error. However, the uptake of EDS by clinicians is limited. Objective We sought to understand physician perceptions and barriers to the uptake of EDS within the emergency department triage process. Methods We conducted a qualitative study using a research associate to rapidly prototype an embedded EDS into the emergency department triage process. Physicians involved in the triage assessment of a busy emergency department were provided the output of an EDS based on the triage complaint by an embedded researcher to simulate an automated system that would draw from the electronic medical record. Physicians were interviewed immediately after their experience. Verbatim transcripts were analyzed by a team using open and axial coding, informed by direct content analysis. Results In all, 4 themes emerged from 14 interviews: (1) the quality of the EDS was inferred from the scope and prioritization of the diagnoses present in the EDS differential; (2) the trust of the EDS was linked to varied beliefs around the diagnostic process and potential for bias; (3) clinicians foresaw more benefit to EDS use for colleagues and trainees rather than themselves; and (4) clinicians felt strongly that EDS output should not be included in the patient record. Conclusions The adoption of an EDS into an emergency department triage process will require a system that provides diagnostic suggestions appropriate for the scope and context of the emergency department triage process, transparency of system design, and affordances for clinician beliefs about the diagnostic process and addresses clinician concern around including EDS output in the patient record.
Collapse
Affiliation(s)
- Matthew Sibbald
- McMaster Education Research, Innovation & Theory (MERIT) Program, Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Bashayer Abdulla
- McMaster Education Research, Innovation & Theory (MERIT) Program, Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Amy Keuhl
- McMaster Education Research, Innovation & Theory (MERIT) Program, Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Geoffrey Norman
- McMaster Education Research, Innovation & Theory (MERIT) Program, Department of Health Research Methods, Evidence & Impact, McMaster University, Hamilton, ON, Canada
| | - Sandra Monteiro
- McMaster Education Research, Innovation & Theory (MERIT) Program, Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Jonathan Sherbino
- McMaster Education Research, Innovation & Theory (MERIT) Program, Department of Medicine, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
19
|
Martínez-García M, Hernández-Lemus E. Data Integration Challenges for Machine Learning in Precision Medicine. Front Med (Lausanne) 2022; 8:784455. [PMID: 35145977 PMCID: PMC8821900 DOI: 10.3389/fmed.2021.784455] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/28/2021] [Indexed: 12/19/2022] Open
Abstract
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on different databases about the molecular and environmental origins of disease, into analytic frameworks, allowing the development of individualized, context-dependent diagnostics, and therapeutic approaches. In this regard, artificial intelligence and machine learning approaches can be used to build analytical models of complex disease aimed at prediction of personalized health conditions and outcomes. Such models must handle the wide heterogeneity of individuals in both their genetic predisposition and their social and environmental determinants. Computational approaches to medicine need to be able to efficiently manage, visualize and integrate, large datasets combining structure, and unstructured formats. This needs to be done while constrained by different levels of confidentiality, ideally doing so within a unified analytical architecture. Efficient data integration and management is key to the successful application of computational intelligence approaches to medicine. A number of challenges arise in the design of successful designs to medical data analytics under currently demanding conditions of performance in personalized medicine, while also subject to time, computational power, and bioethical constraints. Here, we will review some of these constraints and discuss possible avenues to overcome current challenges.
Collapse
Affiliation(s)
- Mireya Martínez-García
- Clinical Research Division, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autnoma de Mexico, Mexico City, Mexico
| |
Collapse
|
20
|
Abstract
Research in cognitive psychology shows that expert clinicians make a medical diagnosis through a two step process of hypothesis generation and hypothesis testing. Experts generate a list of possible diagnoses quickly and intuitively, drawing on previous experience. Experts remember specific examples of various disease categories as exemplars, which enables rapid access to diagnostic possibilities and gives them an intuitive sense of the base rates of various diagnoses. After generating diagnostic hypotheses, clinicians then test the hypotheses and subjectively estimate the probability of each diagnostic possibility by using a heuristic called anchoring and adjusting. Although both novices and experts use this two step diagnostic process, experts distinguish themselves as better diagnosticians through their ability to mobilize experiential knowledge in a manner that is content specific. Experience is clearly the best teacher, but some educational strategies have been shown to modestly improve diagnostic accuracy. Increased knowledge about the cognitive psychology of the diagnostic process and the pitfalls inherent in the process may inform clinical teachers and help learners and clinicians to improve the accuracy of diagnostic reasoning. This article reviews the literature on the cognitive psychology of diagnostic reasoning in the context of cardiovascular disease.
Collapse
Affiliation(s)
- John E Brush
- Sentara Health Research Center, Norfolk, VA, USA
- Eastern Virginia Medical School, Norfolk, VA, USA
| | - Jonathan Sherbino
- McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University, Hamilton, ON, Canada
- Department of Medicine, McMaster University, Hamilton, ON, Canada
| | - Geoffrey R Norman
- McMaster Education Research, Innovation and Theory (MERIT) Program, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
21
|
Ranji SR, Thomas EJ. Research to improve diagnosis: time to study the real world. BMJ Qual Saf 2022; 31:255-258. [PMID: 34987085 DOI: 10.1136/bmjqs-2021-014071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/13/2021] [Indexed: 11/04/2022]
Affiliation(s)
- Sumant R Ranji
- Medicine, University of California, San Francisco, California, USA
| | - Eric J Thomas
- Internal Medicine, University of Texas John P and Katherine G McGovern Medical School, Houston, Texas, USA
| |
Collapse
|
22
|
Kawamura R, Harada Y, Sugimoto S, Nagase Y, Katsukura S, Shimizu T. Incidence of diagnostic errors in unplanned hospitalized patients using an automated medical history-taking system with differential diagnosis generator: retrospective observational study (Preprint). JMIR Med Inform 2021; 10:e35225. [PMID: 35084347 PMCID: PMC8832260 DOI: 10.2196/35225] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 12/11/2021] [Accepted: 01/02/2022] [Indexed: 11/23/2022] Open
Abstract
Background Automated medical history–taking systems that generate differential diagnosis lists have been suggested to contribute to improved diagnostic accuracy. However, the effect of these systems on diagnostic errors in clinical practice remains unknown. Objective This study aimed to assess the incidence of diagnostic errors in an outpatient department, where an artificial intelligence (AI)–driven automated medical history–taking system that generates differential diagnosis lists was implemented in clinical practice. Methods We conducted a retrospective observational study using data from a community hospital in Japan. We included patients aged 20 years and older who used an AI-driven, automated medical history–taking system that generates differential diagnosis lists in the outpatient department of internal medicine for whom the index visit was between July 1, 2019, and June 30, 2020, followed by unplanned hospitalization within 14 days. The primary endpoint was the incidence of diagnostic errors, which were detected using the Revised Safer Dx Instrument by at least two independent reviewers. To evaluate the effect of differential diagnosis lists from the AI system on the incidence of diagnostic errors, we compared the incidence of these errors between a group where the AI system generated the final diagnosis in the differential diagnosis list and a group where the AI system did not generate the final diagnosis in the list; the Fisher exact test was used for comparison between these groups. For cases with confirmed diagnostic errors, further review was conducted to identify the contributing factors of these errors via discussion among three reviewers, using the Safer Dx Process Breakdown Supplement as a reference. Results A total of 146 patients were analyzed. A final diagnosis was confirmed for 138 patients and was observed in the differential diagnosis list from the AI system for 69 patients. Diagnostic errors occurred in 16 out of 146 patients (11.0%, 95% CI 6.4%-17.2%). Although statistically insignificant, the incidence of diagnostic errors was lower in cases where the final diagnosis was included in the differential diagnosis list from the AI system than in cases where the final diagnosis was not included in the list (7.2% vs 15.9%, P=.18). Conclusions The incidence of diagnostic errors among patients in the outpatient department of internal medicine who used an automated medical history–taking system that generates differential diagnosis lists seemed to be lower than the previously reported incidence of diagnostic errors. This result suggests that the implementation of an automated medical history–taking system that generates differential diagnosis lists could be beneficial for diagnostic safety in the outpatient department of internal medicine.
Collapse
Affiliation(s)
- Ren Kawamura
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Japan
| | - Yukinori Harada
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Japan
- Department of Internal Medicine, Nagano Chuo Hospital, Nagano, Japan
| | - Shu Sugimoto
- Department of Internal Medicine, Nagano Chuo Hospital, Nagano, Japan
| | - Yuichiro Nagase
- Department of Internal Medicine, Nagano Chuo Hospital, Nagano, Japan
| | - Shinichi Katsukura
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Japan
| | - Taro Shimizu
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Mibu, Japan
| |
Collapse
|
23
|
Graber ML. Reaching 95%: decision support tools are the surest way to improve diagnosis now. BMJ Qual Saf 2021; 31:415-418. [PMID: 34642227 DOI: 10.1136/bmjqs-2021-014033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2021] [Indexed: 11/04/2022]
Affiliation(s)
- Mark L Graber
- Healthcare Quality and Outcomes, RTI International, St James, NY, USA
| |
Collapse
|