1
|
Kumar RP, Sivan V, Bachir H, Sarwar SA, Ruzicka F, O'Malley GR, Lobo P, Morales IC, Cassimatis ND, Hundal JS, Patel NV. Can Artificial Intelligence Mitigate Missed Diagnoses by Generating Differential Diagnoses for Neurosurgeons? World Neurosurg 2024; 187:e1083-e1088. [PMID: 38759788 DOI: 10.1016/j.wneu.2024.05.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 05/08/2024] [Accepted: 05/09/2024] [Indexed: 05/19/2024]
Abstract
BACKGROUND/OBJECTIVE Neurosurgery emphasizes the criticality of accurate differential diagnoses, with diagnostic delays posing significant health and economic challenges. As large language models (LLMs) emerge as transformative tools in healthcare, this study seeks to elucidate their role in assisting neurosurgeons with the differential diagnosis process, especially during preliminary consultations. METHODS This study employed 3 chat-based LLMs, ChatGPT (versions 3.5 and 4.0), Perplexity AI, and Bard AI, to evaluate their diagnostic accuracy. Each LLM was prompted using clinical vignettes, and their responses were recorded to generate differential diagnoses for 20 common and uncommon neurosurgical disorders. Disease-specific prompts were crafted using Dynamed, a clinical reference tool. The accuracy of the LLMs was determined based on their ability to identify the target disease within their top differential diagnoses correctly. RESULTS For the initial differential, ChatGPT 3.5 achieved an accuracy of 52.63%, while ChatGPT 4.0 performed slightly better at 53.68%. Perplexity AI and Bard AI demonstrated 40.00% and 29.47% accuracy, respectively. As the number of considered differentials increased from 2 to 5, ChatGPT 3.5 reached its peak accuracy of 77.89% for the top 5 differentials. Bard AI and Perplexity AI had varied performances, with Bard AI improving in the top 5 differentials at 62.11%. On a disease-specific note, the LLMs excelled in diagnosing conditions like epilepsy and cervical spine stenosis but faced challenges with more complex diseases such as Moyamoya disease and amyotrophic lateral sclerosis. CONCLUSIONS LLMs showcase the potential to enhance diagnostic accuracy and decrease the incidence of missed diagnoses in neurosurgery.
Collapse
Affiliation(s)
- Rohit Prem Kumar
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA.
| | - Vijay Sivan
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Hanin Bachir
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Syed A Sarwar
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Francis Ruzicka
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Geoffrey R O'Malley
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Paulo Lobo
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Ilona Cazorla Morales
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Nicholas D Cassimatis
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA
| | - Jasdeep S Hundal
- Department of Neurology, HMH-Jersey Shore University Medical Center, Neptune, New Jersey, USA
| | - Nitesh V Patel
- Department of Neurosurgery, Hackensack Meridian School of Medicine, Nutley, New Jersey, USA; Department of Neurosurgery, HMH-Jersey Shore University Medical Center, Neptune, New Jersey, USA
| |
Collapse
|
2
|
Mahowald K, Ivanova AA, Blank IA, Kanwisher N, Tenenbaum JB, Fedorenko E. Dissociating language and thought in large language models. Trends Cogn Sci 2024; 28:517-540. [PMID: 38508911 DOI: 10.1016/j.tics.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 01/31/2024] [Accepted: 01/31/2024] [Indexed: 03/22/2024]
Abstract
Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence (knowledge of linguistic rules and patterns) and functional linguistic competence (understanding and using language in the world). We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. We posit that models that use language in human-like ways would need to master both of these competence types, which, in turn, could require the emergence of separate mechanisms specialized for formal versus functional linguistic competence.
Collapse
|
3
|
Bruera A, Tao Y, Anderson A, Çokal D, Haber J, Poesio M. Modeling Brain Representations of Words' Concreteness in Context Using GPT-2 and Human Ratings. Cogn Sci 2023; 47:e13388. [PMID: 38103208 DOI: 10.1111/cogs.13388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 09/12/2023] [Accepted: 10/27/2023] [Indexed: 12/18/2023]
Abstract
The meaning of most words in language depends on their context. Understanding how the human brain extracts contextualized meaning, and identifying where in the brain this takes place, remain important scientific challenges. But technological and computational advances in neuroscience and artificial intelligence now provide unprecedented opportunities to study the human brain in action as language is read and understood. Recent contextualized language models seem to be able to capture homonymic meaning variation ("bat", in a baseball vs. a vampire context), as well as more nuanced differences of meaning-for example, polysemous words such as "book", which can be interpreted in distinct but related senses ("explain a book", information, vs. "open a book", object) whose differences are fine-grained. We study these subtle differences in lexical meaning along the concrete/abstract dimension, as they are triggered by verb-noun semantic composition. We analyze functional magnetic resonance imaging (fMRI) activations elicited by Italian verb phrases containing nouns whose interpretation is affected by the verb to different degrees. By using a contextualized language model and human concreteness ratings, we shed light on where in the brain such fine-grained meaning variation takes place and how it is coded. Our results show that phrase concreteness judgments and the contextualized model can predict BOLD activation associated with semantic composition within the language network. Importantly, representations derived from a complex, nonlinear composition process consistently outperform simpler composition approaches. This is compatible with a holistic view of semantic composition in the brain, where semantic representations are modified by the process of composition itself. When looking at individual brain areas, we find that encoding performance is statistically significant, although with differing patterns of results, suggesting differential involvement, in the posterior superior temporal sulcus, inferior frontal gyrus and anterior temporal lobe, and in motor areas previously associated with processing of concreteness/abstractness.
Collapse
Affiliation(s)
- Andrea Bruera
- School of Electronic Engineering and Computer Science, Cognitive Science Research Group, Queen Mary University of London
- Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences
| | - Yuan Tao
- Department of Cognitive Science, Johns Hopkins University
| | | | - Derya Çokal
- Department of German Language and Literature I-Linguistics, University of Cologne
| | - Janosch Haber
- School of Electronic Engineering and Computer Science, Cognitive Science Research Group, Queen Mary University of London
- Chattermill, London
| | - Massimo Poesio
- School of Electronic Engineering and Computer Science, Cognitive Science Research Group, Queen Mary University of London
- Department of Information and Computing Sciences, University of Utrecht
| |
Collapse
|