1
|
Duong D, Solomon BD. Analysis of large-language model versus human performance for genetics questions. Eur J Hum Genet 2024; 32:466-468. [PMID: 37246194 PMCID: PMC10999420 DOI: 10.1038/s41431-023-01396-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/09/2023] [Accepted: 05/16/2023] [Indexed: 05/30/2023] Open
Abstract
Large-language models like ChatGPT have recently received a great deal of attention. One area of interest pertains to how these models could be used in biomedical contexts, including related to human genetics. To assess one facet of this, we compared the performance of ChatGPT versus human respondents (13,642 human responses) in answering 85 multiple-choice questions about aspects of human genetics. Overall, ChatGPT did not perform significantly differently (p = 0.8327) than human respondents; ChatGPT was 68.2% accurate, compared to 66.6% accuracy for human respondents. Both ChatGPT and humans performed better on memorization-type questions versus critical thinking questions (p < 0.0001). When asked the same question multiple times, ChatGPT frequently provided different answers (16% of initial responses), including for both initially correct and incorrect answers, and gave plausible explanations for both correct and incorrect answers. ChatGPT's performance was impressive, but currently demonstrates significant shortcomings for clinical or other high-stakes use. Addressing these limitations will be important to guide adoption in real-life situations.
Collapse
Affiliation(s)
- Dat Duong
- Medical Genomics Unit, Medical Genetics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Benjamin D Solomon
- Medical Genomics Unit, Medical Genetics Branch, National Human Genome Research Institute, Bethesda, MD, USA.
| |
Collapse
|
2
|
Sood A, Mansoor N, Memmi C, Lynch M, Lynch J. Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions. Int J Comput Assist Radiol Surg 2024:10.1007/s11548-024-03071-9. [PMID: 38381363 DOI: 10.1007/s11548-024-03071-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 02/01/2024] [Indexed: 02/22/2024]
Abstract
PURPOSE AI-image interpretation, through convolutional neural networks, shows increasing capability within radiology. These models have achieved impressive performance in specific tasks within controlled settings, but possess inherent limitations, such as the inability to consider clinical context. We assess the ability of large language models (LLMs) within the context of radiology specialty exams to determine whether they can evaluate relevant clinical information. METHODS A database of questions was created with official sample, author written, and textbook questions based on the Royal College of Radiology (United Kingdom) FRCR 2A and American Board of Radiology (ABR) Certifying examinations. The questions were input into the Generative Pretrained Transformer (GPT) versions 3 and 4, with prompting to answer the questions. RESULTS One thousand seventy-two questions were evaluated by GPT-3 and GPT-4. 495 (46.2%) were for the FRCR 2A and 577 (53.8%) were for the ABR exam. There were 890 single best answers (SBA), and 182 true/false questions. GPT-4 was correct in 629/890 (70.7%) SBA and 151/182 (83.0%) true/false questions. There was no degradation on author written questions. GPT-4 performed significantly better than GPT-3 which selected the correct answer in 282/890 (31.7%) SBA and 111/182 (61.0%) true/false questions. Performance of GPT-4 was similar across both examinations for all categories of question. CONCLUSION The newest generation of LLMs, GPT-4, demonstrates high capability in answering radiology exam questions. It shows marked improvement from GPT-3, suggesting further improvements in accuracy are possible. Further research is needed to explore the clinical applicability of these AI models in real-world settings.
Collapse
Affiliation(s)
- Avnish Sood
- King's College London, Strand, London, WC2R 2LS, UK
| | - Nina Mansoor
- Department of Neuroradiology, Kings College Hospital, Denmark Hill, London, SE59RS, UK
| | - Caroline Memmi
- Imperial College London, Exhibition Road, London, SW7 2AZ, UK
| | - Magnus Lynch
- King's College London Centre for Stem Cells and Regenerative Medicine, Guy's Hospital, Great Maze Pond, London, UK
- St John's Institute of Dermatology, King's College London, London, UK
| | - Jeremy Lynch
- Department of Neuroradiology, Kings College Hospital, Denmark Hill, London, SE59RS, UK.
| |
Collapse
|
3
|
Williams SC, Starup-Hansen J, Funnell JP, Hanrahan JG, Valetopoulou A, Singh N, Sinha S, Muirhead WR, Marcus HJ. Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study. Br J Neurosurg 2024:1-10. [PMID: 38305239 DOI: 10.1080/02688697.2024.2308222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 01/16/2024] [Indexed: 02/03/2024]
Abstract
PURPOSE This study aimed to compare the performance of ChatGPT, a large language model (LLM), with human neurosurgical applicants in a neurosurgical national selection interview, to assess the potential of artificial intelligence (AI) and LLMs in healthcare and provide insights into their integration into the field. METHODS In a prospective comparative study, a set of neurosurgical national selection-style interview questions were asked to eight human participants and ChatGPT in an online interview. All participants were doctors currently practicing in the UK who had applied for a neurosurgical National Training Number. Interviews were recorded, anonymised, and scored by three neurosurgical consultants with experience as interviewers for national selection. Answers provided by ChatGPT were used as a template for a virtual interview. Interview transcripts were subsequently scored by neurosurgical consultants using criteria utilised in real national selection interviews. Overall interview score and subdomain scores were compared between human participants and ChatGPT. RESULTS For overall score, ChatGPT fell behind six human competitors and did not achieve a mean score higher than any individuals who achieved training positions. Several factors, including factual inaccuracies and deviations from expected structure and style may have contributed to ChatGPT's underperformance. CONCLUSIONS LLMs such as ChatGPT have huge potential for integration in healthcare. However, this study emphasises the need for further development to address limitations and challenges. While LLMs have not surpassed human performance yet, collaboration between humans and AI systems holds promise for the future of healthcare.
Collapse
Affiliation(s)
- Simon C Williams
- Department of Neurosurgery, St George's University Hospital, London, UK
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - Joachim Starup-Hansen
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, UK
| | - Jonathan P Funnell
- Department of Neurosurgery, St George's University Hospital, London, UK
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| | - John Gerrard Hanrahan
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, UK
| | | | - Navneet Singh
- Department of Neurosurgery, St George's University Hospital, London, UK
| | - Saurabh Sinha
- Department of Neurosurgery, Sheffield Teaching Hospitals, Sheffield, UK
| | - William R Muirhead
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, UK
| | - Hani J Marcus
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
- Department of Neurosurgery, National Hospital for Neurology and Neurosurgery, London, UK
| |
Collapse
|
4
|
Pauling C, Kanber B, Arthurs OJ, Shelmerdine SC. Commercially available artificial intelligence tools for fracture detection: the evidence. BJR Open 2024; 6:tzad005. [PMID: 38352182 PMCID: PMC10860511 DOI: 10.1093/bjro/tzad005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 09/20/2023] [Accepted: 09/30/2023] [Indexed: 02/16/2024] Open
Abstract
Missed fractures are a costly healthcare issue, not only negatively impacting patient lives, leading to potential long-term disability and time off work, but also responsible for high medicolegal disbursements that could otherwise be used to improve other healthcare services. When fractures are overlooked in children, they are particularly concerning as opportunities for safeguarding may be missed. Assistance from artificial intelligence (AI) in interpreting medical images may offer a possible solution for improving patient care, and several commercial AI tools are now available for radiology workflow implementation. However, information regarding their development, evidence for performance and validation as well as the intended target population is not always clear, but vital when evaluating a potential AI solution for implementation. In this article, we review the range of available products utilizing AI for fracture detection (in both adults and children) and summarize the evidence, or lack thereof, behind their performance. This will allow others to make better informed decisions when deciding which product to procure for their specific clinical requirements.
Collapse
Affiliation(s)
- Cato Pauling
- UCL Great Ormond Street Institute of Child Health, University College London, London WC1E 6BT, United Kingdom
| | - Baris Kanber
- Queen Square Multiple Sclerosis Centre, Department of Neuroinflammation, University College London (UCL) Queen Square Institute of Neurology, Faculty of Brain Sciences, University College London, London WC1N 3BG, United Kingdom
- Department of Medical Physics and Biomedical Engineering, Centre for Medical Image Computing, University College London, London WC1E 6BT, United Kingdom
| | - Owen J Arthurs
- UCL Great Ormond Street Institute of Child Health, University College London, London WC1E 6BT, United Kingdom
- Department of Clinical Radiology, Great Ormond Street Hospital for Children NHS Foundation Trust, London WC1N 3JH, United Kingdom
- NIHR Great Ormond Street Hospital Biomedical Research Centre, Bloomsbury, London WC1N 1EH, United Kingdom
| | - Susan C Shelmerdine
- UCL Great Ormond Street Institute of Child Health, University College London, London WC1E 6BT, United Kingdom
- Department of Clinical Radiology, Great Ormond Street Hospital for Children NHS Foundation Trust, London WC1N 3JH, United Kingdom
- NIHR Great Ormond Street Hospital Biomedical Research Centre, Bloomsbury, London WC1N 1EH, United Kingdom
| |
Collapse
|
5
|
Elmahdy M, Sebro R. Beyond the AJR: Comparison of Artificial Intelligence Candidate and Radiologists on Mock Examinations for the Fellow of Royal College of Radiology Part B. AJR Am J Roentgenol 2023; 221:555. [PMID: 36856302 DOI: 10.2214/ajr.23.29155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Affiliation(s)
- Mahmoud Elmahdy
- Department of Radiology, Mayo Clinic, 4500 San Pablo Rd, Jacksonville, FL 32224
| | - Ronnie Sebro
- Department of Radiology, Mayo Clinic, 4500 San Pablo Rd, Jacksonville, FL 32224
- Center for Augmented Intelligence, Mayo Clinic, Jacksonville, FL
- Department of Orthopedic Surgery, Mayo Clinic, Jacksonville, FL
- Department of Biostatistics, Center for Quantitative Health Sciences, Jacksonville, FL
| |
Collapse
|
6
|
Pearce J, Chiavaroli N. Rethinking assessment in response to generative artificial intelligence. MEDICAL EDUCATION 2023; 57:889-891. [PMID: 37042389 DOI: 10.1111/medu.15092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 03/28/2023] [Indexed: 06/19/2023]
Affiliation(s)
- Jacob Pearce
- Tertiary Education, Australian Council for Educational Research, Camberwell, Victoria, Australia
| | - Neville Chiavaroli
- Tertiary Education, Australian Council for Educational Research, Camberwell, Victoria, Australia
| |
Collapse
|
7
|
Teebagy S, Colwell L, Wood E, Yaghy A, Faustina M. Improved Performance of ChatGPT-4 on the OKAP Examination: A Comparative Study with ChatGPT-3.5. JOURNAL OF ACADEMIC OPHTHALMOLOGY (2017) 2023; 15:e184-e187. [PMID: 37701862 PMCID: PMC10495224 DOI: 10.1055/s-0043-1774399] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 08/10/2023] [Indexed: 09/14/2023]
Abstract
Introduction: This study aims to evaluate the performance of ChatGPT-4, an advanced artificial intelligence (AI) language model, on the Ophthalmology Knowledge Assessment Program (OKAP) examination compared to its predecessor, ChatGPT-3.5. Methods: Both models were tested on 180 OKAP practice questions covering various ophthalmology subject categories. Results: ChatGPT-4 significantly outperformed ChatGPT-3.5 (81% vs. 57%; p <0.001), indicating improvements in medical knowledge assessment. Discussion: The superior performance of ChatGPT-4 suggests potential applicability in ophthalmologic education and clinical decision support systems. Future research should focus on refining AI models, ensuring a balanced representation of fundamental and specialized knowledge, and determining the optimal method of integrating AI into medical education and practice.
Collapse
Affiliation(s)
- Sean Teebagy
- Department of Ophthalmology and Visual Sciences, UMass Chan Medical School, Worcester, Massachusetts
| | - Lauren Colwell
- Department of Ophthalmology and Visual Sciences, UMass Chan Medical School, Worcester, Massachusetts
| | - Emma Wood
- Department of Ophthalmology and Visual Sciences, UMass Chan Medical School, Worcester, Massachusetts
| | - Antonio Yaghy
- Department of Ophthalmology and Visual Sciences, UMass Chan Medical School, Worcester, Massachusetts
| | - Misha Faustina
- Department of Ophthalmology and Visual Sciences, UMass Chan Medical School, Worcester, Massachusetts
| |
Collapse
|
8
|
Ranjan A, Parpaleix A, Cardoso J, Adeleke S. AI vs FRCR: What it means for the future. Eur J Radiol 2023; 165:110918. [PMID: 37311341 DOI: 10.1016/j.ejrad.2023.110918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 05/22/2023] [Accepted: 05/31/2023] [Indexed: 06/15/2023]
Abstract
A recent work by Shelmerdine et al. was published in the Christmas edition of the BMJ. The authors were inspired by George Hinton's statement that artificial intelligence (AI) would supersede radiologists, and ventured to investigate whether the AI software Milvue Suite which had been trained on a few hundred thousand chest and musculoskeletal x-rays, could pass the rapid reporting section of the FRCR - an exam which must be passed in order to practice as a consultant radiologist in the UK. This brief comment sums up the company's opinions and perspective from the practical AI developmental angle and also its translation into a commercially viable and clinically useful tool. Hoping this will provide a fair and balanced view of the role of AI in radiology.
Collapse
Affiliation(s)
- Aditi Ranjan
- Royal Berkshire Hospital NHS Foundation Trust, Reading, United Kingdom
| | | | - Jorge Cardoso
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom
| | - Sola Adeleke
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, United Kingdom.
| |
Collapse
|
9
|
Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations. Radiology 2023:230582. [PMID: 37191485 DOI: 10.1148/radiol.230582] [Citation(s) in RCA: 113] [Impact Index Per Article: 113.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Background ChatGPT is a powerful artificial intelligence large language model with great potential as a tool in medical practice and education, but its performance in radiology remains unclear. Purpose To assess the performance of ChatGPT on radiology board-style examination questions without images and to explore its strengths and limitations. Materials and Methods In this exploratory prospective study performed from February 25 to March 3, 2023, 150 multiple-choice questions designed to match the style, content, and difficulty of the Canadian Royal College and American Board of Radiology examinations were grouped by question type (lower-order [recall, understanding] and higher-order [apply, analyze, synthesize] thinking) and topic (physics, clinical). The higher-order thinking questions were further subclassified by type (description of imaging findings, clinical management, application of concepts, calculation and classification, disease associations). ChatGPT performance was evaluated overall, by question type, and by topic. Confidence of language in responses was assessed. Univariable analysis was performed. Results ChatGPT answered 69% of questions correctly (104 of 150). The model performed better on questions requiring lower-order thinking (84%, 51 of 61) than on those requiring higher-order thinking (60%, 53 of 89) (P = .002). When compared with lower-order questions, the model performed worse on questions involving description of imaging findings (61%, 28 of 46; P = .04), calculation and classification (25%, two of eight; P = .01), and application of concepts (30%, three of 10; P = .01). ChatGPT performed as well on higher-order clinical management questions (89%, 16 of 18) as on lower-order questions (P = .88). It performed worse on physics questions (40%, six of 15) than on clinical questions (73%, 98 of 135) (P = .02). ChatGPT used confident language consistently, even when incorrect (100%, 46 of 46). Conclusion Despite no radiology-specific pretraining, ChatGPT nearly passed a radiology board-style examination without images; it performed well on lower-order thinking questions and clinical management questions but struggled with higher-order thinking questions involving description of imaging findings, calculation and classification, and application of concepts. © RSNA, 2023 See also the editorial by Lourenco et al in this issue.
Collapse
Affiliation(s)
- Rajesh Bhayana
- From the University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Mount Sinai Hospital and Women's College Hospital, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Mulk Building, 1st Fl, Toronto, ON, Canada M5G 24C
| | - Satheesh Krishna
- From the University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Mount Sinai Hospital and Women's College Hospital, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Mulk Building, 1st Fl, Toronto, ON, Canada M5G 24C
| | - Robert R Bleakney
- From the University Medical Imaging Toronto, Joint Department of Medical Imaging, University Health Network, Mount Sinai Hospital and Women's College Hospital, University of Toronto, Toronto General Hospital, 200 Elizabeth St, Peter Mulk Building, 1st Fl, Toronto, ON, Canada M5G 24C
| |
Collapse
|
10
|
Alberts IL, Mercolli L, Pyka T, Prenosil G, Shi K, Rominger A, Afshar-Oromieh A. Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be? Eur J Nucl Med Mol Imaging 2023; 50:1549-1552. [PMID: 36892666 PMCID: PMC9995718 DOI: 10.1007/s00259-023-06172-w] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 02/19/2023] [Indexed: 03/10/2023]
Affiliation(s)
- Ian L Alberts
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Freiburgstr. 18, 3010, Bern, Switzerland.
| | - Lorenzo Mercolli
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Freiburgstr. 18, 3010, Bern, Switzerland
| | - Thomas Pyka
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Freiburgstr. 18, 3010, Bern, Switzerland
| | - George Prenosil
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Freiburgstr. 18, 3010, Bern, Switzerland
| | - Kuangyu Shi
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Freiburgstr. 18, 3010, Bern, Switzerland
| | - Axel Rominger
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Freiburgstr. 18, 3010, Bern, Switzerland
| | - Ali Afshar-Oromieh
- Department of Nuclear Medicine, Inselspital, Bern University Hospital, University of Bern, Freiburgstr. 18, 3010, Bern, Switzerland
| |
Collapse
|
11
|
Field EL, Tam W, Moore N, McEntee M. Efficacy of Artificial Intelligence in the Categorisation of Paediatric Pneumonia on Chest Radiographs: A Systematic Review. CHILDREN 2023; 10:children10030576. [PMID: 36980134 PMCID: PMC10047666 DOI: 10.3390/children10030576] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/04/2023] [Accepted: 03/15/2023] [Indexed: 03/19/2023]
Abstract
This study aimed to systematically review the literature to synthesise and summarise the evidence surrounding the efficacy of artificial intelligence (AI) in classifying paediatric pneumonia on chest radiographs (CXRs). Following the initial search of studies that matched the pre-set criteria, their data were extracted using a data extraction tool, and the included studies were assessed via critical appraisal tools and risk of bias. Results were accumulated, and outcome measures analysed included sensitivity, specificity, accuracy, and area under the curve (AUC). Five studies met the inclusion criteria. The highest sensitivity was by an ensemble AI algorithm (96.3%). DenseNet201 obtained the highest level of specificity and accuracy (94%, 95%). The most outstanding AUC value was achieved by the VGG16 algorithm (96.2%). Some of the AI models achieved close to 100% diagnostic accuracy. To assess the efficacy of AI in a clinical setting, these AI models should be compared to that of radiologists. The included and evaluated AI algorithms showed promising results. These algorithms can potentially ease and speed up diagnosis once the studies are replicated and their performances are assessed in clinical settings, potentially saving millions of lives.
Collapse
Affiliation(s)
- Erica Louise Field
- Discipline of Medical Imaging and Radiation Therapy, University College Cork, College Road, T12 K8AF Cork, Ireland
| | - Winnie Tam
- Department of Midwifery and Radiography, University of London, Northampton Square, London EC1V 0HB, UK
- Correspondence:
| | - Niamh Moore
- Discipline of Medical Imaging and Radiation Therapy, University College Cork, College Road, T12 K8AF Cork, Ireland
| | - Mark McEntee
- Discipline of Medical Imaging and Radiation Therapy, University College Cork, College Road, T12 K8AF Cork, Ireland
| |
Collapse
|
12
|
Duong D, Solomon BD. Analysis of large-language model versus human performance for genetics questions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.27.23285115. [PMID: 36789422 PMCID: PMC9928145 DOI: 10.1101/2023.01.27.23285115] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Large-language models like ChatGPT have recently received a great deal of attention. To assess ChatGPT in the field of genetics, we compared its performance to human respondents in answering genetics questions (involving 13,636 responses) that had been posted on social media platforms starting in 2021. Overall, ChatGPT did not perform significantly differently than human respondents, but did significantly better on memorization-type questions versus critical thinking questions, frequently provided different answers when asked questions multiple times, and provided plausible explanations for both correct and incorrect answers.
Collapse
|
13
|
Sezgin E. Artificial intelligence in healthcare: Complementing, not replacing, doctors and healthcare providers. Digit Health 2023; 9:20552076231186520. [PMID: 37426593 PMCID: PMC10328041 DOI: 10.1177/20552076231186520] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 06/20/2023] [Indexed: 07/11/2023] Open
Abstract
The utilization of artificial intelligence (AI) in clinical practice has increased and is evidently contributing to improved diagnostic accuracy, optimized treatment planning, and improved patient outcomes. The rapid evolution of AI, especially generative AI and large language models (LLMs), have reignited the discussions about their potential impact on the healthcare industry, particularly regarding the role of healthcare providers. Concerning questions, "can AI replace doctors?" and "will doctors who are using AI replace those who are not using it?" have been echoed. To shed light on this debate, this article focuses on emphasizing the augmentative role of AI in healthcare, underlining that AI is aimed to complement, rather than replace, doctors and healthcare providers. The fundamental solution emerges with the human-AI collaboration, which combines the cognitive strengths of healthcare providers with the analytical capabilities of AI. A human-in-the-loop (HITL) approach ensures that the AI systems are guided, communicated, and supervised by human expertise, thereby maintaining safety and quality in healthcare services. Finally, the adoption can be forged further by the organizational process informed by the HITL approach to improve multidisciplinary teams in the loop. AI can create a paradigm shift in healthcare by complementing and enhancing the skills of healthcare providers, ultimately leading to improved service quality, patient outcomes, and a more efficient healthcare system.
Collapse
Affiliation(s)
- Emre Sezgin
- Center for Biobehavioral Health, Abigail
Wexner Research Institute at Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of
Medicine, Columbus, OH, USA
| |
Collapse
|
14
|
Parpaleix A, Parsy C, Cordari M, Mejdoubi M. Assessment of a combined musculoskeletal and chest deep learning-based detection solution in an emergency setting. Eur J Radiol Open 2023; 10:100482. [PMID: 36941993 PMCID: PMC10023863 DOI: 10.1016/j.ejro.2023.100482] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 01/31/2023] [Accepted: 03/01/2023] [Indexed: 03/12/2023] Open
Abstract
Rationale and objectives Triage and diagnostic deep learning-based support solutions have started to take hold in everyday emergency radiology practice with the hope of alleviating workflows. Although previous works had proven that artificial intelligence (AI) may increase radiologist and/or emergency physician reading performances, they were restricted to finding, bodypart and/or age subgroups, without evaluating a routine emergency workflow composed of chest and musculoskeletal adult and pediatric cases. We aimed at evaluating a multiple musculoskeletal and chest radiographic findings deep learning-based commercial solution on an adult and pediatric emergency workflow, focusing on discrepancies between emergency and radiology physicians. Material and methods This retrospective, monocentric and observational study included 1772 patients who underwent an emergency radiograph between July and October 2020, excluding spine, skull and plain abdomen procedures. Emergency and radiology reports, obtained without AI as part of the clinical workflow, were collected and discordant cases were reviewed to obtain the radiology reference standard. Case-level AI outputs and emergency reports were compared to the reference standard. DeLong and Wald tests were used to compare ROC-AUC and Sensitivity/Specificity, respectively. Results Results showed an overall AI ROC-AUC of 0.954 with no difference across age or body part subgroups. Real-life emergency physicians' sensitivity was 93.7 %, not significantly different to the AI model (P = 0.105), however in 172/1772 (9.7 %) cases misdiagnosed by emergency physicians. In this subset, AI accuracy was 90.1 %. Conclusion This study highlighted that multiple findings AI solution for emergency radiographs is efficient and complementary to emergency physicians, and could help reduce misdiagnosis in the absence of immediate radiological expertize.
Collapse
Affiliation(s)
- Alexandre Parpaleix
- Department of Radiology, Valenciennes General Hospital, Valenciennes, France
- Correspondence to: Département de radiologie, Centre Hospitalier de Valenciennes, 114 Av. Desandrouin, 59300 Valenciennes, France.
| | - Clémence Parsy
- Department of Radiology, Valenciennes General Hospital, Valenciennes, France
| | | | - Mehdi Mejdoubi
- Department of Radiology, Valenciennes General Hospital, Valenciennes, France
| |
Collapse
|
15
|
Affiliation(s)
| | - Athena Ko
- University of Ottawa, Department of Psychiatry, Ottawa, ON, Canada
| |
Collapse
|
16
|
Rasanathan J. Crumbs of comfort in this time of despair. BMJ : BRITISH MEDICAL JOURNAL 2022. [DOI: 10.1136/bmj.o3046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|