1
|
Igarashi T, Iijima K, Nitta K, Chen Y. Estimation of the Cognitive Functioning of the Elderly by AI Agents: A Comparative Analysis of the Effects of the Psychological Burden of Intervention. Healthcare (Basel) 2024; 12:1821. [PMID: 39337162 PMCID: PMC11431058 DOI: 10.3390/healthcare12181821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 08/08/2024] [Accepted: 08/13/2024] [Indexed: 09/30/2024] Open
Abstract
In recent years, an increasing number of studies have begun to use conversational data in spontaneous speech to estimate cognitive function in older people. The targets of spontaneous speech with older people used to be physicians and licensed psychologists, but it is now possible to have conversations with fully automatic AI agents. However, it has not yet been clarified what difference there is in conversational communication with older people when the examiner is a human or an AI agent. This study explored the psychological burden experienced by elderly participants during cognitive function assessments, comparing interactions with human and AI conversational partners. Thirty-four participants, averaging 78.71 years of age, were evaluated using the Mini-Mental State Examination (MMSE), the Visual Analogue Scale (VAS), and the State-Trait Anxiety Inventory (STAI). The objective was to assess the psychological impact of different conversational formats on the participants. The results indicated that the mental strain, as measured by VAS and STAI scores, was significantly higher during the MMSE sessions compared to other conversational interactions (p < 0.01). Notably, there was no significant difference in the mental burden between conversations with humans and AI agents, suggesting that AI-based systems could be as effective as human interaction in cognitive assessments.
Collapse
Affiliation(s)
- Toshiharu Igarashi
- Simulation of Complex Systems Laboratory, Department of Human and Engineered Environmental Studies, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo 277-8563, Japan
- AI-UX Design Research Institution, Advanced Institute of Industrial Technology, 10-40 Higashi-Oi 1-Chome, Shinagawa, Tokyo 140-0011, Japan
| | - Katsuya Iijima
- Institute of Gerontology (IOG), The University of Tokyo, Tokyo 113-8656, Japan
- Institute for Future Initiatives (IFI), The University of Tokyo, Tokyo 113-0033, Japan
| | - Kunio Nitta
- Tsukushikai Medical Corporation, Tokyo 186-0005, Japan
| | - Yu Chen
- Simulation of Complex Systems Laboratory, Department of Human and Engineered Environmental Studies, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo 277-8563, Japan
| |
Collapse
|
2
|
Berisha V, Liss JM. Responsible development of clinical speech AI: Bridging the gap between clinical research and technology. NPJ Digit Med 2024; 7:208. [PMID: 39122889 PMCID: PMC11316053 DOI: 10.1038/s41746-024-01199-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 07/19/2024] [Indexed: 08/12/2024] Open
Abstract
This perspective article explores the challenges and potential of using speech as a biomarker in clinical settings, particularly when constrained by the small clinical datasets typically available in such contexts. We contend that by integrating insights from speech science and clinical research, we can reduce sample complexity in clinical speech AI models with the potential to decrease timelines to translation. Most existing models are based on high-dimensional feature representations trained with limited sample sizes and often do not leverage insights from speech science and clinical research. This approach can lead to overfitting, where the models perform exceptionally well on training data but fail to generalize to new, unseen data. Additionally, without incorporating theoretical knowledge, these models may lack interpretability and robustness, making them challenging to troubleshoot or improve post-deployment. We propose a framework for organizing health conditions based on their impact on speech and promote the use of speech analytics in diverse clinical contexts beyond cross-sectional classification. For high-stakes clinical use cases, we advocate for a focus on explainable and individually-validated measures and stress the importance of rigorous validation frameworks and ethical considerations for responsible deployment. Bridging the gap between AI research and clinical speech research presents new opportunities for more efficient translation of speech-based AI tools and advancement of scientific discoveries in this interdisciplinary space, particularly if limited to small or retrospective datasets.
Collapse
Affiliation(s)
- Visar Berisha
- School of Electrical Computer and Energy Engineering and College of Health Solutions, Arizona State University, Tempe, AZ, USA.
| | - Julie M Liss
- College of Health Solutions, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
3
|
García-Méndez S, de Arriba-Pérez F. Large Language Models and Healthcare Alliance: Potential and Challenges of Two Representative Use Cases. Ann Biomed Eng 2024; 52:1928-1931. [PMID: 38310159 DOI: 10.1007/s10439-024-03454-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/15/2024] [Indexed: 02/05/2024]
Abstract
Large language models (LLMS) emerge as the most promising Natural Language Processing approach for clinical practice acceleration (i.e., diagnosis, prevention and treatment procedures). Similarly, intelligent conversational systems that leverage LLMS have disruptively become the future of therapy in the era of ChatGPT. Accordingly, this research addresses the application of LLMS in healthcare, paying particular attention to two relevant use cases: cognitive decline and depression, more specifically, postpartum depression. In the end, the most promising opportunities they represent (e.g., clinical tasks augmentation, personalized healthcare, etc.) and related concerns (e.g., data privacy and quality, fairness, etc.) are discussed to contribute to the global debate on their integration in the sanitary system.
Collapse
|
4
|
Moulaei K, Yadegari A, Baharestani M, Farzanbakhsh S, Sabet B, Reza Afrash M. Generative artificial intelligence in healthcare: A scoping review on benefits, challenges and applications. Int J Med Inform 2024; 188:105474. [PMID: 38733640 DOI: 10.1016/j.ijmedinf.2024.105474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 05/03/2024] [Accepted: 05/04/2024] [Indexed: 05/13/2024]
Abstract
BACKGROUND Generative artificial intelligence (GAI) is revolutionizing healthcare with solutions for complex challenges, enhancing diagnosis, treatment, and care through new data and insights. However, its integration raises questions about applications, benefits, and challenges. Our study explores these aspects, offering an overview of GAI's applications and future prospects in healthcare. METHODS This scoping review searched Web of Science, PubMed, and Scopus . The selection of studies involved screening titles, reviewing abstracts, and examining full texts, adhering to the PRISMA-ScR guidelines throughout the process. RESULTS From 1406 articles across three databases, 109 met inclusion criteria after screening and deduplication. Nine GAI models were utilized in healthcare, with ChatGPT (n = 102, 74 %), Google Bard (Gemini) (n = 16, 11 %), and Microsoft Bing AI (n = 10, 7 %) being the most frequently employed. A total of 24 different applications of GAI in healthcare were identified, with the most common being "offering insights and information on health conditions through answering questions" (n = 41) and "diagnosis and prediction of diseases" (n = 17). In total, 606 benefits and challenges were identified, which were condensed to 48 benefits and 61 challenges after consolidation. The predominant benefits included "Providing rapid access to information and valuable insights" and "Improving prediction and diagnosis accuracy", while the primary challenges comprised "generating inaccurate or fictional content", "unknown source of information and fake references for texts", and "lower accuracy in answering questions". CONCLUSION This scoping review identified the applications, benefits, and challenges of GAI in healthcare. This synthesis offers a crucial overview of GAI's potential to revolutionize healthcare, emphasizing the imperative to address its limitations.
Collapse
Affiliation(s)
- Khadijeh Moulaei
- Department of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
| | - Atiye Yadegari
- Department of Pediatric Dentistry, School of Dentistry, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Mahdi Baharestani
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
| | - Shayan Farzanbakhsh
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
| | - Babak Sabet
- Department of Surgery, Faculty of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mohammad Reza Afrash
- Department of Artificial Intelligence, Smart University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
5
|
Haltaufderheide J, Ranisch R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ Digit Med 2024; 7:183. [PMID: 38977771 PMCID: PMC11231310 DOI: 10.1038/s41746-024-01157-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 05/29/2024] [Indexed: 07/10/2024] Open
Abstract
With the introduction of ChatGPT, Large Language Models (LLMs) have received enormous attention in healthcare. Despite potential benefits, researchers have underscored various ethical implications. While individual instances have garnered attention, a systematic and comprehensive overview of practical applications currently researched and ethical issues connected to them is lacking. Against this background, this work maps the ethical landscape surrounding the current deployment of LLMs in medicine and healthcare through a systematic review. Electronic databases and preprint servers were queried using a comprehensive search strategy which generated 796 records. Studies were screened and extracted following a modified rapid review approach. Methodological quality was assessed using a hybrid approach. For 53 records, a meta-aggregative synthesis was performed. Four general fields of applications emerged showcasing a dynamic exploration phase. Advantages of using LLMs are attributed to their capacity in data analysis, information provisioning, support in decision-making or mitigating information loss and enhancing information accessibility. However, our study also identifies recurrent ethical concerns connected to fairness, bias, non-maleficence, transparency, and privacy. A distinctive concern is the tendency to produce harmful or convincing but inaccurate content. Calls for ethical guidance and human oversight are recurrent. We suggest that the ethical guidance debate should be reframed to focus on defining what constitutes acceptable human oversight across the spectrum of applications. This involves considering the diversity of settings, varying potentials for harm, and different acceptable thresholds for performance and certainty in healthcare. Additionally, critical inquiry is needed to evaluate the necessity and justification of LLMs' current experimental use.
Collapse
Affiliation(s)
- Joschka Haltaufderheide
- Faculty of Health Sciences Brandenburg, University of Potsdam, Am Mühlenberg 9, Potsdam, 14476, Germany
| | - Robert Ranisch
- Faculty of Health Sciences Brandenburg, University of Potsdam, Am Mühlenberg 9, Potsdam, 14476, Germany.
| |
Collapse
|
6
|
Abid M, Asif M, Khemane Z, Jawaid A, Waqar Khan A, Naveed H, Naveed T, Farah AA, Siddiq MA. Advances in artificial intelligence for diagnosing Alzheimer's disease through speech. Ann Med Surg (Lond) 2024; 86:3822-3823. [PMID: 38989201 PMCID: PMC11230774 DOI: 10.1097/ms9.0000000000002200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 05/08/2024] [Indexed: 07/12/2024] Open
Affiliation(s)
- Mishal Abid
- Department of Medicine, Dow University of Health Sciences
| | | | - Zoya Khemane
- Department of Medicine, Dow University of Health Sciences
| | - Afia Jawaid
- Department of Medicine, Dow University of Health Sciences
| | | | - Hufsa Naveed
- Department of Medicine, Ziauddin Medical College, Karachi, Pakistan
| | - Tooba Naveed
- Department of Medicine, Ziauddin Medical College, Karachi, Pakistan
| | - Asma Ahmed Farah
- Department of Medicine, East Africa University, Boosaaso, Somalia
| | | |
Collapse
|
7
|
Chen Y, Al-Nusaif M, Li S, Tan X, Yang H, Cai H, Le W. Progress on early diagnosing Alzheimer's disease. Front Med 2024; 18:446-464. [PMID: 38769282 PMCID: PMC11391414 DOI: 10.1007/s11684-023-1047-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 11/15/2023] [Indexed: 05/22/2024]
Abstract
Alzheimer's disease (AD) is a progressive neurodegenerative disorder that affects both cognition and non-cognition functions. The disease follows a continuum, starting with preclinical stages, progressing to mild cognitive and behavioral impairment, ultimately leading to dementia. Early detection of AD is crucial for better diagnosis and more effective treatment. However, the current AD diagnostic tests of biomarkers using cerebrospinal fluid and/or brain imaging are invasive or expensive, and mostly are still not able to detect early disease state. Consequently, there is an urgent need to develop new diagnostic techniques with higher sensitivity and specificity during the preclinical stages of AD. Various non-cognitive manifestations, including behavioral abnormalities, sleep disturbances, sensory dysfunctions, and physical changes, have been observed in the preclinical AD stage before occurrence of notable cognitive decline. Recent research advances have identified several biofluid biomarkers as early indicators of AD. This review focuses on these non-cognitive changes and newly discovered biomarkers in AD, specifically addressing the preclinical stages of the disease. Furthermore, it is of importance to explore the potential for developing a predictive system or network to forecast disease onset and progression at the early stage of AD.
Collapse
Affiliation(s)
- Yixin Chen
- Liaoning Provincial Key Laboratory for Research on the Pathogenic Mechanisms of Neurological Diseases, The First Affiliated Hospital of Dalian Medical University, Dalian, 116021, China
| | - Murad Al-Nusaif
- Liaoning Provincial Key Laboratory for Research on the Pathogenic Mechanisms of Neurological Diseases, The First Affiliated Hospital of Dalian Medical University, Dalian, 116021, China
| | - Song Li
- Liaoning Provincial Key Laboratory for Research on the Pathogenic Mechanisms of Neurological Diseases, The First Affiliated Hospital of Dalian Medical University, Dalian, 116021, China
| | - Xiang Tan
- Liaoning Provincial Key Laboratory for Research on the Pathogenic Mechanisms of Neurological Diseases, The First Affiliated Hospital of Dalian Medical University, Dalian, 116021, China
| | - Huijia Yang
- Liaoning Provincial Key Laboratory for Research on the Pathogenic Mechanisms of Neurological Diseases, The First Affiliated Hospital of Dalian Medical University, Dalian, 116021, China
| | - Huaibin Cai
- Transgenic Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Weidong Le
- Liaoning Provincial Key Laboratory for Research on the Pathogenic Mechanisms of Neurological Diseases, The First Affiliated Hospital of Dalian Medical University, Dalian, 116021, China.
- Institute of Neurology, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, Chengdu, 610072, China.
| |
Collapse
|
8
|
Michel J. How can precision health care contribute to healthy aging? Aging Med (Milton) 2024; 7:269-271. [PMID: 38975306 PMCID: PMC11222728 DOI: 10.1002/agm2.12333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 05/09/2024] [Accepted: 05/29/2024] [Indexed: 07/09/2024] Open
Abstract
The future of medicine will be closely linked to technological progress, to the great benefit of aging adults. Increasing knowledge in fields encompassing biology, physiology and functioning of the aging process, combined with the early detection of non-clinically apparent but significant changes will make it possible to promote healthy aging.
Collapse
Affiliation(s)
- Jean‐Pierre Michel
- University of GenevaGenevaSwitzerland
- French Academy of MedicineParisFrance
| |
Collapse
|
9
|
Irfan B, Kuoppamäki S, Skantze G. Recommendations for designing conversational companion robots with older adults through foundation models. Front Robot AI 2024; 11:1363713. [PMID: 38860032 PMCID: PMC11163135 DOI: 10.3389/frobt.2024.1363713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 05/07/2024] [Indexed: 06/12/2024] Open
Abstract
Companion robots are aimed to mitigate loneliness and social isolation among older adults by providing social and emotional support in their everyday lives. However, older adults' expectations of conversational companionship might substantially differ from what current technologies can achieve, as well as from other age groups like young adults. Thus, it is crucial to involve older adults in the development of conversational companion robots to ensure that these devices align with their unique expectations and experiences. The recent advancement in foundation models, such as large language models, has taken a significant stride toward fulfilling those expectations, in contrast to the prior literature that relied on humans controlling robots (i.e., Wizard of Oz) or limited rule-based architectures that are not feasible to apply in the daily lives of older adults. Consequently, we conducted a participatory design (co-design) study with 28 older adults, demonstrating a companion robot using a large language model (LLM), and design scenarios that represent situations from everyday life. The thematic analysis of the discussions around these scenarios shows that older adults expect a conversational companion robot to engage in conversation actively in isolation and passively in social settings, remember previous conversations and personalize, protect privacy and provide control over learned data, give information and daily reminders, foster social skills and connections, and express empathy and emotions. Based on these findings, this article provides actionable recommendations for designing conversational companion robots for older adults with foundation models, such as LLMs and vision-language models, which can also be applied to conversational robots in other domains.
Collapse
Affiliation(s)
- Bahar Irfan
- Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Sanna Kuoppamäki
- Division of Health Informatics and Logistics, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Gabriel Skantze
- Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
10
|
Treder MS, Lee S, Tsvetanov KA. Introduction to Large Language Models (LLMs) for dementia care and research. FRONTIERS IN DEMENTIA 2024; 3:1385303. [PMID: 39081594 PMCID: PMC11285660 DOI: 10.3389/frdem.2024.1385303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 04/23/2024] [Indexed: 08/02/2024]
Abstract
Introduction Dementia is a progressive neurodegenerative disorder that affects cognitive abilities including memory, reasoning, and communication skills, leading to gradual decline in daily activities and social engagement. In light of the recent advent of Large Language Models (LLMs) such as ChatGPT, this paper aims to thoroughly analyse their potential applications and usefulness in dementia care and research. Method To this end, we offer an introduction into LLMs, outlining the key features, capabilities, limitations, potential risks, and practical considerations for deployment as easy-to-use software (e.g., smartphone apps). We then explore various domains related to dementia, identifying opportunities for LLMs to enhance understanding, diagnostics, and treatment, with a broader emphasis on improving patient care. For each domain, the specific contributions of LLMs are examined, such as their ability to engage users in meaningful conversations, deliver personalized support, and offer cognitive enrichment. Potential benefits encompass improved social interaction, enhanced cognitive functioning, increased emotional well-being, and reduced caregiver burden. The deployment of LLMs in caregiving frameworks also raises a number of concerns and considerations. These include privacy and safety concerns, the need for empirical validation, user-centered design, adaptation to the user's unique needs, and the integration of multimodal inputs to create more immersive and personalized experiences. Additionally, ethical guidelines and privacy protocols must be established to ensure responsible and ethical deployment of LLMs. Results We report the results on a questionnaire filled in by people with dementia (PwD) and their supporters wherein we surveyed the usefulness of different application scenarios of LLMs as well as the features that LLM-powered apps should have. Both PwD and supporters were largely positive regarding the prospect of LLMs in care, although concerns were raised regarding bias, data privacy and transparency. Discussion Overall, this review corroborates the promising utilization of LLMs to positively impact dementia care by boosting cognitive abilities, enriching social interaction, and supporting caregivers. The findings underscore the importance of further research and development in this field to fully harness the benefits of LLMs and maximize their potential for improving the lives of individuals living with dementia.
Collapse
Affiliation(s)
- Matthias S. Treder
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | - Sojin Lee
- Olive AI Limited, London, United Kingdom
| | - Kamen A. Tsvetanov
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, United Kingdom
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
11
|
Wang L, Wan Z, Ni C, Song Q, Li Y, Clayton EW, Malin BA, Yin Z. A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.26.24306390. [PMID: 38712148 PMCID: PMC11071576 DOI: 10.1101/2024.04.26.24306390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Background The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including healthcare. Numerous studies have since been conducted regarding how to employ state-of-the-art LLMs in health-related scenarios to assist patients, doctors, and public health administrators. Objective This review aims to summarize the applications and concerns of applying conversational LLMs in healthcare and provide an agenda for future research on LLMs in healthcare. Methods We utilized PubMed, ACM, and IEEE digital libraries as primary sources for this review. We followed the guidance of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA) to screen and select peer-reviewed research articles that (1) were related to both healthcare applications and conversational LLMs and (2) were published before September 1st, 2023, the date when we started paper collection and screening. We investigated these papers and classified them according to their applications and concerns. Results Our search initially identified 820 papers according to targeted keywords, out of which 65 papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT from OpenAI (60), followed by Bard from Google (1), Large Language Model Meta AI (LLaMA) from Meta (1), and other LLMs (5). These papers were classified into four categories in terms of their applications: 1) summarization, 2) medical knowledge inquiry, 3) prediction, and 4) administration, and four categories of concerns: 1) reliability, 2) bias, 3) privacy, and 4) public acceptability. There are 49 (75%) research papers using LLMs for summarization and/or medical knowledge inquiry, and 58 (89%) research papers expressing concerns about reliability and/or bias. We found that conversational LLMs exhibit promising results in summarization and providing medical knowledge to patients with a relatively high accuracy. However, conversational LLMs like ChatGPT are not able to provide reliable answers to complex health-related tasks that require specialized domain expertise. Additionally, no experiments in our reviewed papers have been conducted to thoughtfully examine how conversational LLMs lead to bias or privacy issues in healthcare research. Conclusions Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications brought bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in healthcare.
Collapse
Affiliation(s)
- Leyao Wang
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Zhiyu Wan
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
| | - Congning Ni
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Qingyuan Song
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Yang Li
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
| | - Ellen Wright Clayton
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA, 37203
- Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, Tennessee, USA, 37203
| | - Bradley A. Malin
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
- Department of Biostatistics, Vanderbilt University Medical Center, TN, USA, 37203
| | - Zhijun Yin
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA, 37212
- Department of Biomedical Informatics, Vanderbilt University Medical Center, TN, USA, 37203
| |
Collapse
|
12
|
Kaser AN, Lacritz LH, Winiarski HR, Gabirondo P, Schaffert J, Coca AJ, Jiménez-Raboso J, Rojo T, Zaldua C, Honorato I, Gallego D, Nieves ER, Rosenstein LD, Cullum CM. A novel speech analysis algorithm to detect cognitive impairment in a Spanish population. Front Neurol 2024; 15:1342907. [PMID: 38638311 PMCID: PMC11024431 DOI: 10.3389/fneur.2024.1342907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/26/2024] [Indexed: 04/20/2024] Open
Abstract
Objective Early detection of cognitive impairment in the elderly is crucial for diagnosis and appropriate care. Brief, cost-effective cognitive screening instruments are needed to help identify individuals who require further evaluation. This study presents preliminary data on a new screening technology using automated voice recording analysis software in a Spanish population. Method Data were collected from 174 Spanish-speaking individuals clinically diagnosed as cognitively normal (CN, n = 87) or impaired (mild cognitive impairment [MCI], n = 63; all-cause dementia, n = 24). Participants were recorded performing four common language tasks (Animal fluency, alternating fluency [sports and fruits], phonemic "F" fluency, and Cookie Theft Description). Recordings were processed via text-transcription and digital-signal processing techniques to capture neuropsychological variables and audio characteristics. A training sample of 122 subjects with similar demographics across groups was used to develop an algorithm to detect cognitive impairment. Speech and task features were used to develop five independent machine learning (ML) models to compute scores between 0 and 1, and a final algorithm was constructed using repeated cross-validation. A socio-demographically balanced subset of 52 participants was used to test the algorithm. Analysis of covariance (ANCOVA), covarying for demographic characteristics, was used to predict logistically-transformed algorithm scores. Results Mean logit algorithm scores were significantly different across groups in the testing sample (p < 0.01). Comparisons of CN with impaired (MCI + dementia) and MCI groups using the final algorithm resulted in an AUC of 0.93/0.90, with overall accuracy of 88.4%/87.5%, sensitivity of 87.5/83.3, and specificity of 89.2/89.2, respectively. Conclusion Findings provide initial support for the utility of this automated speech analysis algorithm as a screening tool for cognitive impairment in Spanish speakers. Additional study is needed to validate this technology in larger and more diverse clinical populations.
Collapse
Affiliation(s)
- Alyssa N. Kaser
- Department of Psychiatry, The University of Texas Southwestern Medical Center, Dallas, TX, United States
| | - Laura H. Lacritz
- Department of Psychiatry, The University of Texas Southwestern Medical Center, Dallas, TX, United States
- Department of Neurology, The University of Texas Southwestern Medical Center, Dallas, TX, United States
| | - Holly R. Winiarski
- Department of Psychiatry, The University of Texas Southwestern Medical Center, Dallas, TX, United States
| | | | - Jeff Schaffert
- Department of Psychiatry, The University of Texas Southwestern Medical Center, Dallas, TX, United States
| | - Alberto J. Coca
- AcceXible Impacto, Sociedad Limitada, Bilbao, Spain
- Cambridge Mathematics of Information in Healthcare Hub, University of Cambridge, Cambridge, United Kingdom
| | | | - Tomas Rojo
- AcceXible Impacto, Sociedad Limitada, Bilbao, Spain
| | - Carla Zaldua
- AcceXible Impacto, Sociedad Limitada, Bilbao, Spain
| | | | | | - Emmanuel Rosario Nieves
- Department of Psychiatry, The University of Texas Southwestern Medical Center, Dallas, TX, United States
- Parkland Health and Hospital System Behavioral Health Clinic, Dallas, TX, United States
| | - Leslie D. Rosenstein
- Department of Psychiatry, The University of Texas Southwestern Medical Center, Dallas, TX, United States
- Parkland Health and Hospital System Behavioral Health Clinic, Dallas, TX, United States
| | - C. Munro Cullum
- Department of Psychiatry, The University of Texas Southwestern Medical Center, Dallas, TX, United States
- Department of Neurology, The University of Texas Southwestern Medical Center, Dallas, TX, United States
- Department of Neurological Surgery, The University of Texas Southwestern Medical Center, Dallas, TX, United States
| |
Collapse
|
13
|
Wang L, Ma Y, Bi W, Lv H, Li Y. An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study. J Med Internet Res 2024; 26:e54580. [PMID: 38551633 PMCID: PMC11015372 DOI: 10.2196/54580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/23/2024] [Accepted: 02/14/2024] [Indexed: 04/02/2024] Open
Abstract
BACKGROUND The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. OBJECTIVE This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. METHODS The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People's Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert's annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU. RESULTS The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. CONCLUSIONS The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records.
Collapse
Affiliation(s)
- Lei Wang
- BGI Research, Wuhan, China
- Guangdong Bigdata Engineering Technology Research Center for Life Sciences, BGI Research, Shenzhen, China
| | - Yinyao Ma
- Department of Obstetrics, People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, China
| | | | | | - Yuxiang Li
- BGI Research, Wuhan, China
- Guangdong Bigdata Engineering Technology Research Center for Life Sciences, BGI Research, Shenzhen, China
| |
Collapse
|
14
|
Runde BS, Alapati A, Bazan NG. The Optimization of a Natural Language Processing Approach for the Automatic Detection of Alzheimer's Disease Using GPT Embeddings. Brain Sci 2024; 14:211. [PMID: 38539600 PMCID: PMC10968873 DOI: 10.3390/brainsci14030211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 02/19/2024] [Accepted: 02/22/2024] [Indexed: 04/04/2024] Open
Abstract
The development of noninvasive and cost-effective methods of detecting Alzheimer's disease (AD) is essential for its early prevention and mitigation. We optimize the detection of AD using natural language processing (NLP) of spontaneous speech through the use of audio enhancement techniques and novel transcription methodologies. Specifically, we utilized Boll Spectral Subtraction to improve audio fidelity and created transcriptions using state-of-the-art AI services-locally-based Wav2Vec and Whisper, alongside cloud-based IBM Cloud and Rev AI-evaluating their performance against traditional manual transcription methods. Support Vector Machine (SVM) classifiers were then trained and tested using GPT-based embeddings of transcriptions. Our findings revealed that AI-based transcriptions largely outperformed traditional manual ones, with Wav2Vec (enhanced audio) achieving the best accuracy and F-1 score (0.99 for both metrics) for locally-based systems and Rev AI (standard audio) performing the best for cloud-based systems (0.96 for both metrics). Furthermore, this study revealed the detrimental effects of interviewer speech on model performance in addition to the minimal effect of audio enhancement. Based on our findings, current AI transcription and NLP technologies are highly effective at accurately detecting AD with available data but struggle to classify probable AD and mild cognitive impairment (MCI), a prodromal stage of AD, due to a lack of training data, laying the groundwork for the future implementation of an automatic AD detection system.
Collapse
Affiliation(s)
- Benjamin S. Runde
- Science Engineering Research Center, The Potomac School, McLean, VA 22101, USA
| | - Ajit Alapati
- Neuroscience Center of Excellence, School of Medicine, New Orleans, LA 70112, USA;
| | - Nicolas G. Bazan
- Neuroscience Center of Excellence, School of Medicine, New Orleans, LA 70112, USA;
| |
Collapse
|
15
|
Wang L, Bi W, Zhao S, Ma Y, Lv L, Meng C, Fu J, Lv H. Investigating the Impact of Prompt Engineering on the Performance of Large Language Models for Standardizing Obstetric Diagnosis Text: Comparative Study. JMIR Form Res 2024; 8:e53216. [PMID: 38329787 PMCID: PMC10884897 DOI: 10.2196/53216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/25/2023] [Accepted: 01/11/2024] [Indexed: 02/09/2024] Open
Abstract
BACKGROUND The accumulation of vast electronic medical records (EMRs) through medical informatization creates significant research value, particularly in obstetrics. Diagnostic standardization across different health care institutions and regions is vital for medical data analysis. Large language models (LLMs) have been extensively used for various medical tasks. Prompt engineering is key to use LLMs effectively. OBJECTIVE This study aims to evaluate and compare the performance of LLMs with various prompt engineering techniques on the task of standardizing obstetric diagnostic terminology using real-world obstetric data. METHODS The paper describes a 4-step approach used for mapping diagnoses in electronic medical records to the International Classification of Diseases, 10th revision, observation domain. First, similarity measures were used for mapping the diagnoses. Second, candidate mapping terms were collected based on similarity scores above a threshold, to be used as the training data set. For generating optimal mapping terms, we used two LLMs (ChatGLM2 and Qwen-14B-Chat [QWEN]) for zero-shot learning in step 3. Finally, a performance comparison was conducted by using 3 pretrained bidirectional encoder representations from transformers (BERTs), including BERT, whole word masking BERT, and momentum contrastive learning with BERT (MC-BERT), for unsupervised optimal mapping term generation in the fourth step. RESULTS LLMs and BERT demonstrated comparable performance at their respective optimal levels. LLMs showed clear advantages in terms of performance and efficiency in unsupervised settings. Interestingly, the performance of the LLMs varied significantly across different prompt engineering setups. For instance, when applying the self-consistency approach in QWEN, the F1-score improved by 5%, with precision increasing by 7.9%, outperforming the zero-shot method. Likewise, ChatGLM2 delivered similar rates of accurately generated responses. During the analysis, the BERT series served as a comparative model with comparable results. Among the 3 models, MC-BERT demonstrated the highest level of performance. However, the differences among the versions of BERT in this study were relatively insignificant. CONCLUSIONS After applying LLMs to standardize diagnoses and designing 4 different prompts, we compared the results to those generated by the BERT model. Our findings indicate that QWEN prompts largely outperformed the other prompts, with precision comparable to that of the BERT model. These results demonstrate the potential of unsupervised approaches in improving the efficiency of aligning diagnostic terms in daily research and uncovering hidden information values in patient data.
Collapse
Affiliation(s)
| | | | | | - Yinyao Ma
- The People's Hospital of Guangxi Zhuang Autonomous Region, Guangxi, China
| | | | | | | | | |
Collapse
|
16
|
Sezgin E. Redefining Virtual Assistants in Health Care: The Future With Large Language Models. J Med Internet Res 2024; 26:e53225. [PMID: 38241074 PMCID: PMC10837753 DOI: 10.2196/53225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/25/2023] [Accepted: 01/02/2024] [Indexed: 01/23/2024] Open
Abstract
This editorial explores the evolving and transformative role of large language models (LLMs) in enhancing the capabilities of virtual assistants (VAs) in the health care domain, highlighting recent research on the performance of VAs and LLMs in health care information sharing. Focusing on recent research, this editorial unveils the marked improvement in the accuracy and clinical relevance of responses from LLMs, such as GPT-4, compared to current VAs, especially in addressing complex health care inquiries, like those related to postpartum depression. The improved accuracy and clinical relevance with LLMs mark a paradigm shift in digital health tools and VAs. Furthermore, such LLM applications have the potential to dynamically adapt and be integrated into existing VA platforms, offering cost-effective, scalable, and inclusive solutions. These suggest a significant increase in the applicable range of VA applications, as well as the increased value, risk, and impact in health care, moving toward more personalized digital health ecosystems. However, alongside these advancements, it is necessary to develop and adhere to ethical guidelines, regulatory frameworks, governance principles, and privacy and safety measures. We need a robust interdisciplinary collaboration to navigate the complexities of safely and effectively integrating LLMs into health care applications, ensuring that these emerging technologies align with the diverse needs and ethical considerations of the health care domain.
Collapse
Affiliation(s)
- Emre Sezgin
- The Abigail Wexner Reseach Institute at Nationwide Children's Hospital, Columbus, OH, United States
- The Ohio State University College of Medicine, Columbus, OH, United States
| |
Collapse
|
17
|
Lin WC, Chen A, Song X, Weiskopf NG, Chiang MF, Hribar MR. Prediction of multiclass surgical outcomes in glaucoma using multimodal deep learning based on free-text operative notes and structured EHR data. J Am Med Inform Assoc 2024; 31:456-464. [PMID: 37964658 PMCID: PMC10797280 DOI: 10.1093/jamia/ocad213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/16/2023] [Accepted: 10/25/2023] [Indexed: 11/16/2023] Open
Abstract
OBJECTIVE Surgical outcome prediction is challenging but necessary for postoperative management. Current machine learning models utilize pre- and post-op data, excluding intraoperative information in surgical notes. Current models also usually predict binary outcomes even when surgeries have multiple outcomes that require different postoperative management. This study addresses these gaps by incorporating intraoperative information into multimodal models for multiclass glaucoma surgery outcome prediction. MATERIALS AND METHODS We developed and evaluated multimodal deep learning models for multiclass glaucoma trabeculectomy surgery outcomes using both structured EHR data and free-text operative notes. We compare those to baseline models that use structured EHR data exclusively, or neural network models that leverage only operative notes. RESULTS The multimodal neural network had the highest performance with a macro AUROC of 0.750 and F1 score of 0.583. It outperformed the baseline machine learning model with structured EHR data alone (macro AUROC of 0.712 and F1 score of 0.486). Additionally, the multimodal model achieved the highest recall (0.692) for hypotony surgical failure, while the surgical success group had the highest precision (0.884) and F1 score (0.775). DISCUSSION This study shows that operative notes are an important source of predictive information. The multimodal predictive model combining perioperative notes and structured pre- and post-op EHR data outperformed other models. Multiclass surgical outcome prediction can provide valuable insights for clinical decision-making. CONCLUSIONS Our results show the potential of deep learning models to enhance clinical decision-making for postoperative management. They can be applied to other specialties to improve surgical outcome predictions.
Collapse
Affiliation(s)
- Wei-Chun Lin
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Rd, Portland, OR, 97239, United States
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, 545 SW Campus Dr, Portland, OR, 97239, United States
| | - Aiyin Chen
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, 545 SW Campus Dr, Portland, OR, 97239, United States
| | - Xubo Song
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Rd, Portland, OR, 97239, United States
| | - Nicole G Weiskopf
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Rd, Portland, OR, 97239, United States
| | - Michael F Chiang
- National Eye Institute, National Institutes of Health, 31 Center Dr MSC 2510, Bethesda, MD, 20892, United States
- National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, United States
| | - Michelle R Hribar
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Rd, Portland, OR, 97239, United States
- Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, 545 SW Campus Dr, Portland, OR, 97239, United States
- National Eye Institute, National Institutes of Health, 31 Center Dr MSC 2510, Bethesda, MD, 20892, United States
| |
Collapse
|
18
|
Runde BS, Alapati A, Bazan NG. The Optimization of a Natural Language Processing Approach for the Automatic Detection of Alzheimer's Disease Using GPT Embeddings. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.14.24301297. [PMID: 38293012 PMCID: PMC10827239 DOI: 10.1101/2024.01.14.24301297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
As the impact of Alzheimer's disease (AD) is projected to grow in the coming decades as the world's population ages, the development of noninvasive and cost-effective methods of detecting AD is essential for the early prevention and mitigation of the progressive disease, alleviating its expected global impact. This study analyzes audio processing techniques and transcription methodologies to optimize the detection of AD through the natural language processing (NLP) of spontaneous speech. We enhanced audio fidelity using Boll Spectral Subtraction and evaluated the transcription accuracy of state-of-the-art AI services-locally-based Wav2Vec and Whisper, alongside cloud-based IBM Cloud and Rev AI-against traditional manual transcription methods. The choice between local and cloud-based solutions hinges on a trade-off between privacy, ongoing costs, and computational requirements. Leveraging OpenAI's GPT for word embeddings, we enhanced the training of Support Vector Machine (SVM) classifiers, which were crucial in analyzing transcripts and refining detection accuracy. Our findings reveal that AI-driven transcriptions significantly outperform manual counterparts when classifying AD and Control samples, with Wav2Vec using enhanced audio exhibiting the highest accuracy and F-1 scores (0.99 for both metrics) for locally based systems and Rev AI using unenhanced audio leading cloud-based methods with comparable precision (0.96 for both metrics). The study also uncovers the detrimental effect of including interviewer speech in recordings on model performance, advocating for the exclusion of such interactions to improve data quality for AD classification algorithms. Our comprehensive evaluation demonstrates that AI transcription (both Cloud and Local) and NLP technologies in their current forms can classify AD, as well as probable AD and mild cognitive impairment (MCI), a prodromal stage of AD, accurately but suffer from a lack of available training data. The insights garnered from this research lay the groundwork for future advancements in the noninvasive monitoring and early detection of cognitive impairments through linguistic analysis.
Collapse
Affiliation(s)
| | - Ajit Alapati
- Neuroscience Center of Excellence, School of Medicine, Louisiana State University
| | - Nicolas G Bazan
- Neuroscience Center of Excellence, School of Medicine, Louisiana State University
| |
Collapse
|
19
|
Dosso JA, Kailley JN, Robillard JM. What Does ChatGPT Know About Dementia? A Comparative Analysis of Information Quality. J Alzheimers Dis 2024; 97:559-565. [PMID: 38143345 PMCID: PMC10836539 DOI: 10.3233/jad-230573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/01/2023] [Indexed: 12/26/2023]
Abstract
The quality of information about dementia retrieved using ChatGPT is unknown. Content was evaluated for length, readability, and quality using the QUEST, a validated tool, and compared against online material from three North American organizations. Both sources of information avoided conflicts of interest, supported the patient-physician relationship, and used a balanced tone. Official bodies but not ChatGPT referenced identifiable research and pointed to local resources. Users of ChatGPT are likely to encounter accurate but shallow information about dementia. Recommendations are made for information creators and providers who counsel patients around digital health practices.
Collapse
Affiliation(s)
- Jill A. Dosso
- Department of Medicine, Division of Neurology, The University of British Columbia, Vancouver, British Columbia, Canada
- BC Children’s and Women’s Hospitals, Vancouver, British Columbia, Canada
| | - Jaya N. Kailley
- Department of Medicine, Division of Neurology, The University of British Columbia, Vancouver, British Columbia, Canada
- BC Children’s and Women’s Hospitals, Vancouver, British Columbia, Canada
| | - Julie M. Robillard
- Department of Medicine, Division of Neurology, The University of British Columbia, Vancouver, British Columbia, Canada
- BC Children’s and Women’s Hospitals, Vancouver, British Columbia, Canada
| |
Collapse
|
20
|
Banks R, Higgins C, Greene BR, Jannati A, Gomes‐Osman J, Tobyne S, Bates D, Pascual‐Leone A. Clinical classification of memory and cognitive impairment with multimodal digital biomarkers. ALZHEIMER'S & DEMENTIA (AMSTERDAM, NETHERLANDS) 2024; 16:e12557. [PMID: 38406610 PMCID: PMC10884988 DOI: 10.1002/dad2.12557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 01/24/2024] [Accepted: 01/24/2024] [Indexed: 02/27/2024]
Abstract
INTRODUCTION Early detection of Alzheimer's disease and cognitive impairment is critical to improving the healthcare trajectories of aging adults, enabling early intervention and potential prevention of decline. METHODS To evaluate multi-modal feature sets for assessing memory and cognitive impairment, feature selection and subsequent logistic regressions were used to identify the most salient features in classifying Rey Auditory Verbal Learning Test-determined memory impairment. RESULTS Multimodal models incorporating graphomotor, memory, and speech and voice features provided the stronger classification performance (area under the curve = 0.83; sensitivity = 0.81, specificity = 0.80). Multimodal models were superior to all other single modality and demographics models. DISCUSSION The current research contributes to the prevailing multimodal profile of those with cognitive impairment, suggesting that it is associated with slower speech with a particular effect on the duration, frequency, and percentage of pauses compared to normal healthy speech.
Collapse
Affiliation(s)
- Russell Banks
- Department of Communicative Sciences & DisordersCollege of Arts & SciencesMichigan State UniversityEast LansingMichiganUSA
| | | | | | - Ali Jannati
- Department of NeurologyHarvard Medical SchoolBostonMassachusettsUSA
| | - Joyce Gomes‐Osman
- Department of NeurologyUniversity of Miami Miller School of MedicineMiamiFloridaUSA
| | | | | | - Alvaro Pascual‐Leone
- Linus HealthBostonMassachusettsUSA
- Department of NeurologyHarvard Medical SchoolBostonMassachusettsUSA
- Hinda and Arthur Marcus Institute for Aging Research and Deanna and Sidney Wolk Center for Memory HealthHebrew SeniorLifeBostonMassachusettsUSA
| |
Collapse
|
21
|
Romano MF, Shih LC, Paschalidis IC, Au R, Kolachalama VB. Large Language Models in Neurology Research and Future Practice. Neurology 2023; 101:1058-1067. [PMID: 37816646 PMCID: PMC10752640 DOI: 10.1212/wnl.0000000000207967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 09/06/2023] [Indexed: 10/12/2023] Open
Abstract
Recent advancements in generative artificial intelligence, particularly using large language models (LLMs), are gaining increased public attention. We provide a perspective on the potential of LLMs to analyze enormous amounts of data from medical records and gain insights on specific topics in neurology. In addition, we explore use cases for LLMs, such as early diagnosis, supporting patient and caregivers, and acting as an assistant for clinicians. We point to the potential ethical and technical challenges raised by LLMs, such as concerns about privacy and data security, potential biases in the data for model training, and the need for careful validation of results. Researchers must consider these challenges and take steps to address them to ensure that their work is conducted in a safe and responsible manner. Despite these challenges, LLMs offer promising opportunities for improving care and treatment of various neurologic disorders.
Collapse
Affiliation(s)
- Michael F Romano
- From the Department of Medicine (M.F.R., R.A., V.B.K.), Boston University Chobanian & Avedisian School of Medicine, MA; Department of Radiology and Biomedical Imaging (M.F.R.), University of California, San Francisco; Department of Neurology (L.C.S., R.A.), Boston University Chobanian & Avedisian School of Medicine; Department of Electrical and Computer Engineering (I.C.P.), Division of Systems Engineering, and Department of Biomedical Engineering; Faculty of Computing and Data Sciences (I.C.P., V.B.K.), Boston University; Department of Anatomy and Neurobiology (R.A.); The Framingham Heart Study, Boston University Chobanian & Avedisian School of Medicine; Department of Epidemiology, Boston University School of Public Health; Boston University Alzheimer's Disease Research Center (R.A.); and Department of Computer Science (V.B.K.), Boston University, MA
| | - Ludy C Shih
- From the Department of Medicine (M.F.R., R.A., V.B.K.), Boston University Chobanian & Avedisian School of Medicine, MA; Department of Radiology and Biomedical Imaging (M.F.R.), University of California, San Francisco; Department of Neurology (L.C.S., R.A.), Boston University Chobanian & Avedisian School of Medicine; Department of Electrical and Computer Engineering (I.C.P.), Division of Systems Engineering, and Department of Biomedical Engineering; Faculty of Computing and Data Sciences (I.C.P., V.B.K.), Boston University; Department of Anatomy and Neurobiology (R.A.); The Framingham Heart Study, Boston University Chobanian & Avedisian School of Medicine; Department of Epidemiology, Boston University School of Public Health; Boston University Alzheimer's Disease Research Center (R.A.); and Department of Computer Science (V.B.K.), Boston University, MA
| | - Ioannis C Paschalidis
- From the Department of Medicine (M.F.R., R.A., V.B.K.), Boston University Chobanian & Avedisian School of Medicine, MA; Department of Radiology and Biomedical Imaging (M.F.R.), University of California, San Francisco; Department of Neurology (L.C.S., R.A.), Boston University Chobanian & Avedisian School of Medicine; Department of Electrical and Computer Engineering (I.C.P.), Division of Systems Engineering, and Department of Biomedical Engineering; Faculty of Computing and Data Sciences (I.C.P., V.B.K.), Boston University; Department of Anatomy and Neurobiology (R.A.); The Framingham Heart Study, Boston University Chobanian & Avedisian School of Medicine; Department of Epidemiology, Boston University School of Public Health; Boston University Alzheimer's Disease Research Center (R.A.); and Department of Computer Science (V.B.K.), Boston University, MA
| | - Rhoda Au
- From the Department of Medicine (M.F.R., R.A., V.B.K.), Boston University Chobanian & Avedisian School of Medicine, MA; Department of Radiology and Biomedical Imaging (M.F.R.), University of California, San Francisco; Department of Neurology (L.C.S., R.A.), Boston University Chobanian & Avedisian School of Medicine; Department of Electrical and Computer Engineering (I.C.P.), Division of Systems Engineering, and Department of Biomedical Engineering; Faculty of Computing and Data Sciences (I.C.P., V.B.K.), Boston University; Department of Anatomy and Neurobiology (R.A.); The Framingham Heart Study, Boston University Chobanian & Avedisian School of Medicine; Department of Epidemiology, Boston University School of Public Health; Boston University Alzheimer's Disease Research Center (R.A.); and Department of Computer Science (V.B.K.), Boston University, MA
| | - Vijaya B Kolachalama
- From the Department of Medicine (M.F.R., R.A., V.B.K.), Boston University Chobanian & Avedisian School of Medicine, MA; Department of Radiology and Biomedical Imaging (M.F.R.), University of California, San Francisco; Department of Neurology (L.C.S., R.A.), Boston University Chobanian & Avedisian School of Medicine; Department of Electrical and Computer Engineering (I.C.P.), Division of Systems Engineering, and Department of Biomedical Engineering; Faculty of Computing and Data Sciences (I.C.P., V.B.K.), Boston University; Department of Anatomy and Neurobiology (R.A.); The Framingham Heart Study, Boston University Chobanian & Avedisian School of Medicine; Department of Epidemiology, Boston University School of Public Health; Boston University Alzheimer's Disease Research Center (R.A.); and Department of Computer Science (V.B.K.), Boston University, MA.
| |
Collapse
|
22
|
Li R, Wang X, Yu H. Two Directions for Clinical Data Generation with Large Language Models: Data-to-Label and Label-to-Data. PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING. CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 2023; 2023:7129-7143. [PMID: 38213944 PMCID: PMC10782150 DOI: 10.18653/v1/2023.findings-emnlp.474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
Large language models (LLMs) can generate natural language texts for various domains and tasks, but their potential for clinical text mining, a domain with scarce, sensitive, and imbalanced medical data, is under-explored. We investigate whether LLMs can augment clinical data for detecting Alzheimer's Disease (AD)-related signs and symptoms from electronic health records (EHRs), a challenging task that requires high expertise. We create a novel pragmatic taxonomy for AD sign and symptom progression based on expert knowledge and generated three datasets: (1) a gold dataset annotated by human experts on longitudinal EHRs of AD patients; (2) a silver dataset created by the data-to-label method, which labels sentences from a public EHR collection with AD-related signs and symptoms; and (3) a bronze dataset created by the label-to-data method which generates sentences with AD-related signs and symptoms based on the label definition. We train a system to detect AD-related signs and symptoms from EHRs. We find that the silver and bronze datasets improves the system performance, outperforming the system using only the gold dataset. This shows that LLMs can generate synthetic clinical data for a complex task by incorporating expert knowledge, and our label-to-data method can produce datasets that are free of sensitive information, while maintaining acceptable quality.
Collapse
Affiliation(s)
- Rumeng Li
- Umass Amherst, Amherst, MA, USA
- VA Bedford Healthcare System, Bedford, MA, USA
| | | | - Hong Yu
- Umass Amherst, Amherst, MA, USA
- VA Bedford Healthcare System, Bedford, MA, USA
- Umass Lowell, Lowell, MA, USA
| |
Collapse
|
23
|
Shi Y, Ren P, Wang J, Han B, ValizadehAslani T, Agbavor F, Zhang Y, Hu M, Zhao L, Liang H. Leveraging GPT-4 for food effect summarization to enhance product-specific guidance development via iterative prompting. J Biomed Inform 2023; 148:104533. [PMID: 37918623 DOI: 10.1016/j.jbi.2023.104533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/12/2023] [Accepted: 10/30/2023] [Indexed: 11/04/2023]
Abstract
Food effect summarization from New Drug Application (NDA) is an essential component of product-specific guidance (PSG) development and assessment, which provides the basis of recommendations for fasting and fed bioequivalence studies to guide the pharmaceutical industry for developing generic drug products. However, manual summarization of food effect from extensive drug application review documents is time-consuming. Therefore, there is a need to develop automated methods to generate food effect summary. Recent advances in natural language processing (NLP), particularly large language models (LLMs) such as ChatGPT and GPT-4, have demonstrated great potential in improving the effectiveness of automated text summarization, but its ability with regard to the accuracy in summarizing food effect for PSG assessment remains unclear. In this study, we introduce a simple yet effective approach,iterative prompting, which allows one to interact with ChatGPT or GPT-4 more effectively and efficiently through multi-turn interaction. Specifically, we propose a three-turn iterative prompting approach to food effect summarization in which the keyword-focused and length-controlled prompts are respectively provided in consecutive turns to refine the quality of the generated summary. We conduct a series of extensive evaluations, ranging from automated metrics to FDA professionals and even evaluation by GPT-4, on 100 NDA review documents selected over the past five years. We observe that the summary quality is progressively improved throughout the iterative prompting process. Moreover, we find that GPT-4 performs better than ChatGPT, as evaluated by FDA professionals (43% vs. 12%) and GPT-4 (64% vs. 35%). Importantly, all the FDA professionals unanimously rated that 85% of the summaries generated by GPT-4 are factually consistent with the golden reference summary, a finding further supported by GPT-4 rating of 72% consistency. Taken together, these results strongly suggest a great potential for GPT-4 to draft food effect summaries that could be reviewed by FDA professionals, thereby improving the efficiency of the PSG assessment cycle and promoting generic drug product development.
Collapse
Affiliation(s)
- Yiwen Shi
- College of Computing and Informatics, Drexel University, Philadelphia, PA, United States
| | - Ping Ren
- Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, United States
| | - Jing Wang
- Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, United States
| | - Biao Han
- Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, United States
| | - Taha ValizadehAslani
- Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, PA, United States
| | - Felix Agbavor
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States
| | - Yi Zhang
- Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, United States
| | - Meng Hu
- Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, United States
| | - Liang Zhao
- Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, United States
| | - Hualou Liang
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States.
| |
Collapse
|
24
|
Bucholc M, James C, Khleifat AA, Badhwar A, Clarke N, Dehsarvi A, Madan CR, Marzi SJ, Shand C, Schilder BM, Tamburin S, Tantiangco HM, Lourida I, Llewellyn DJ, Ranson JM. Artificial intelligence for dementia research methods optimization. Alzheimers Dement 2023; 19:5934-5951. [PMID: 37639369 DOI: 10.1002/alz.13441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 07/19/2023] [Accepted: 07/23/2023] [Indexed: 08/31/2023]
Abstract
Artificial intelligence (AI) and machine learning (ML) approaches are increasingly being used in dementia research. However, several methodological challenges exist that may limit the insights we can obtain from high-dimensional data and our ability to translate these findings into improved patient outcomes. To improve reproducibility and replicability, researchers should make their well-documented code and modeling pipelines openly available. Data should also be shared where appropriate. To enhance the acceptability of models and AI-enabled systems to users, researchers should prioritize interpretable methods that provide insights into how decisions are generated. Models should be developed using multiple, diverse datasets to improve robustness, generalizability, and reduce potentially harmful bias. To improve clarity and reproducibility, researchers should adhere to reporting guidelines that are co-produced with multiple stakeholders. If these methodological challenges are overcome, AI and ML hold enormous promise for changing the landscape of dementia research and care. HIGHLIGHTS: Machine learning (ML) can improve diagnosis, prevention, and management of dementia. Inadequate reporting of ML procedures affects reproduction/replication of results. ML models built on unrepresentative datasets do not generalize to new datasets. Obligatory metrics for certain model structures and use cases have not been defined. Interpretability and trust in ML predictions are barriers to clinical translation.
Collapse
Affiliation(s)
- Magda Bucholc
- Cognitive Analytics Research Lab, School of Computing, Engineering & Intelligent Systems, Ulster University, Derry, UK
| | - Charlotte James
- NIHR Bristol Biomedical Research Centre, University Hospitals Bristol and Weston NHS Foundation Trust and University of Bristol, Bristol, UK
| | - Ahmad Al Khleifat
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - AmanPreet Badhwar
- Multiomics Investigation of Neurodegenerative Diseases (MIND) Lab, Centre de Recherche de l'Institut Universitaire de Gériatrie de Montréal, Montréal, Quebec, Canada
- Institut de génie biomédical, Université de Montréal, Montréal, Quebec, Canada
- Département de Pharmacologie et Physiologie, Université de Montréal, Montréal, Quebec, Canada
| | - Natasha Clarke
- Multiomics Investigation of Neurodegenerative Diseases (MIND) Lab, Centre de Recherche de l'Institut Universitaire de Gériatrie de Montréal, Montréal, Quebec, Canada
| | - Amir Dehsarvi
- Aberdeen Biomedical Imaging Centre, School of Medicine, Medical Sciences, and Nutrition, University of Aberdeen, Aberdeen, UK
| | | | - Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Cameron Shand
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Stefano Tamburin
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- The Alan Turing Institute, London, UK
| | | |
Collapse
|
25
|
Khanna A, Jones G. Toward Personalized Medicine Approaches for Parkinson Disease Using Digital Technologies. JMIR Form Res 2023; 7:e47486. [PMID: 37756050 PMCID: PMC10568402 DOI: 10.2196/47486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 09/03/2023] [Accepted: 09/05/2023] [Indexed: 09/28/2023] Open
Abstract
Parkinson disease (PD) is a complex neurodegenerative disorder that afflicts over 10 million people worldwide, resulting in debilitating motor and cognitive impairment. In the United States alone (with approximately 1 million cases), the economic burden for treating and caring for persons with PD exceeds US $50 billion and myriad therapeutic approaches are under development, including both symptomatic- and disease-modifying agents. The challenges presented in addressing PD are compounded by observations that numerous, statistically distinct patient phenotypes present with a wide variety of motor and nonmotor symptomatic profiles, varying responses to current standard-of-care symptom-alleviating medications (L-DOPA and dopaminergic agonists), and different disease trajectories. The existence of these differing phenotypes highlights the opportunities in personalized approaches to symptom management and disease control. The prodromal period of PD can span across several decades, allowing the potential to leverage the unique array of composite symptoms presented to trigger early interventions. This may be especially beneficial as disease progression in PD (alongside Alzheimer disease and Huntington disease) may be influenced by biological processes such as oxidative stress, offering the potential for individual lifestyle factors to be tailored to delay disease onset. In this viewpoint, we offer potential scenarios where emerging diagnostic and monitoring strategies might be tailored to the individual patient under the tenets of P4 medicine (predict, prevent, personalize, and participate). These approaches may be especially relevant as the causative factors and biochemical pathways responsible for the observed neurodegeneration in patients with PD remain areas of fluid debate. The numerous observational patient cohorts established globally offer an excellent opportunity to test and refine approaches to detect, characterize, control, modify the course, and ultimately stop progression of this debilitating disease. Such approaches may also help development of parallel interventive strategies in other diseases such as Alzheimer disease and Huntington disease, which share common traits and etiologies with PD. In this overview, we highlight near-term opportunities to apply P4 medicine principles for patients with PD and introduce the concept of composite orthogonal patient monitoring.
Collapse
Affiliation(s)
- Amit Khanna
- Neuroscience Global Drug Development, Novartis Pharma AG, Basel, Switzerland
| | - Graham Jones
- GDD Connected Health and Innovation Group, Novartis Pharmaceuticals, East Hanover, NJ, United States
- Clinical and Translational Science Institute, Tufts University Medical Center, Boston, MA, United States
| |
Collapse
|
26
|
He R, Chapin K, Al-Tamimi J, Bel N, Marquié M, Rosende-Roca M, Pytel V, Tartari JP, Alegret M, Sanabria A, Ruiz A, Boada M, Valero S, Hinzen W. Automated Classification of Cognitive Decline and Probable Alzheimer's Dementia Across Multiple Speech and Language Domains. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2023; 32:2075-2086. [PMID: 37486774 DOI: 10.1044/2023_ajslp-22-00403] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
BACKGROUND Decline in language has emerged as a new potential biomarker for the early detection of Alzheimer's disease (AD). It remains unclear how sensitive language measures are across different tasks, language domains, and languages, and to what extent changes can be reliably detected in early stages such as subjective cognitive decline (SCD) and mild cognitive impairment (MCI). METHOD Using a scene construction task for speech elicitation in a new Spanish/Catalan speaking cohort (N = 119), we automatically extracted features across seven domains, three acoustic (spectral, cepstral, and voice quality), one prosodic, and three from text (morpholexical, semantic, and syntactic). They were forwarded to a random forest classifier to evaluate the discriminability of participants with probable AD dementia, amnestic and nonamnestic MCI, SCD, and cognitively healthy controls. Repeated-measures analyses of variance and paired-samples Wilcoxon signed-ranks test were used to assess whether and how performance differs significantly across groups and linguistic domains. RESULTS The performance scores of the machine learning classifier were generally satisfactorily high, with the highest scores over .9. Model performance was significantly different for linguistic domains (p < .001), and speech versus text (p = .043), with speech features outperforming textual features, and voice quality performing best. High diagnostic classification accuracies were seen even within both cognitively healthy (controls vs. SCD) and MCI (amnestic and nonamnestic) groups. CONCLUSION Speech-based machine learning is powerful in detecting cognitive decline and probable AD dementia across a range of different feature domains, though important differences exist between these domains as well. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.23699733.
Collapse
Affiliation(s)
- Rui He
- Department of Translation and Language Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Kayla Chapin
- Department of Translation and Language Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Jalal Al-Tamimi
- Laboratoire de Linguistique Formelle (LLF), CNRS, Université Paris Cité, France
| | - Núria Bel
- Department of Translation and Language Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Marta Marquié
- Ace Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Spain
- Networking Research Center on Neurodegenerative Diseases (CIBERNED), Instituto de Salud Carlos III, Madrid, Spain
| | - Maitee Rosende-Roca
- Ace Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Spain
| | - Vanesa Pytel
- Ace Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Spain
| | - Juan Pablo Tartari
- Ace Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Spain
| | - Montse Alegret
- Ace Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Spain
- Networking Research Center on Neurodegenerative Diseases (CIBERNED), Instituto de Salud Carlos III, Madrid, Spain
| | - Angela Sanabria
- Ace Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Spain
- Networking Research Center on Neurodegenerative Diseases (CIBERNED), Instituto de Salud Carlos III, Madrid, Spain
| | - Agustín Ruiz
- Ace Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Spain
- Networking Research Center on Neurodegenerative Diseases (CIBERNED), Instituto de Salud Carlos III, Madrid, Spain
| | - Mercè Boada
- Ace Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Spain
- Networking Research Center on Neurodegenerative Diseases (CIBERNED), Instituto de Salud Carlos III, Madrid, Spain
| | - Sergi Valero
- Ace Alzheimer Center Barcelona, Universitat Internacional de Catalunya, Spain
- Networking Research Center on Neurodegenerative Diseases (CIBERNED), Instituto de Salud Carlos III, Madrid, Spain
| | - Wolfram Hinzen
- Department of Translation and Language Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| |
Collapse
|
27
|
Mao C, Xu J, Rasmussen L, Li Y, Adekkanattu P, Pacheco J, Bonakdarpour B, Vassar R, Shen L, Jiang G, Wang F, Pathak J, Luo Y. AD-BERT: Using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer's disease. J Biomed Inform 2023; 144:104442. [PMID: 37429512 PMCID: PMC11131134 DOI: 10.1016/j.jbi.2023.104442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 06/13/2023] [Accepted: 07/07/2023] [Indexed: 07/12/2023]
Abstract
OBJECTIVE We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). METHODS We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000 and 2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting into sections, and then pre-trained a BERT model for AD (named AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. All sections of a patient were embedded into a vector representation by AD-BERT and then combined by global MaxPooling and a fully connected network to compute the probability of MCI-to-AD progression. For validation, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. RESULTS Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.849 and F1 score of 0.440 on NMEDW dataset, and AUC of 0.883 and F1 score of 0.680 on WCM dataset. CONCLUSION The use of EHRs for AD-related research is promising, and AD-BERT shows superior predictive performance in modeling MCI-to-AD progression prediction. Our study demonstrates the utility of pre-trained language models and clinical notes in predicting MCI-to-AD progression, which could have important implications for improving early detection and intervention for AD.
Collapse
Affiliation(s)
- Chengsheng Mao
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Jie Xu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States; Weill Cornell Medicine, New York, NY, United States
| | - Luke Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Yikuan Li
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | | | - Jennifer Pacheco
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Borna Bonakdarpour
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Robert Vassar
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Fei Wang
- Weill Cornell Medicine, New York, NY, United States
| | | | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States.
| |
Collapse
|
28
|
Tang L, Zhang Z, Feng F, Yang LZ, Li H. Explainable Alzheimer's Disease Detection Using Linguistic Features from Automatic Speech Recognition. Dement Geriatr Cogn Disord 2023; 52:240-248. [PMID: 37433284 DOI: 10.1159/000531818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 06/29/2023] [Indexed: 07/13/2023] Open
Abstract
INTRODUCTION Alzheimer's disease (AD) is the most prevalent type of dementia and can cause abnormal cognitive function and progressive loss of essential life skills. Early screening is thus necessary for the prevention and intervention of AD. Speech dysfunction is an early onset symptom of AD patients. Recent studies have demonstrated the promise of automated acoustic assessment using acoustic or linguistic features extracted from speech. However, most previous studies have relied on manual transcription of text to extract linguistic features, which weakens the efficiency of automated assessment. The present study thus investigates the effectiveness of automatic speech recognition (ASR) in building an end-to-end automated speech analysis model for AD detection. METHODS We implemented three publicly available ASR engines and compared the classification performance using the ADReSS-IS2020 dataset. Besides, the SHapley Additive exPlanations algorithm was then used to identify critical features that contributed most to model performance. RESULTS Three automatic transcription tools obtained mean word error rate texts of 32%, 43%, and 40%, respectively. These automated texts achieved similar or even better results than manual texts in model performance for detecting dementia, achieving classification accuracies of 89.58%, 83.33%, and 81.25%, respectively. CONCLUSION Our best model, using ensemble learning, is comparable to the state-of-the-art manual transcription-based methods, suggesting the possibility of an end-to-end medical assistance system for AD detection with ASR engines. Moreover, the critical linguistic features might provide insight into further studies on the mechanism of AD.
Collapse
Affiliation(s)
- Lijuan Tang
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| | - Zhenglin Zhang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China
- University of Science and Technology of China, Hefei, China
| | - Feifan Feng
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Department of Biomedical Engineering, Anhui Medical University, Hefei, China
| | - Li-Zhuang Yang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China
- University of Science and Technology of China, Hefei, China
| | - Hai Li
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China
- University of Science and Technology of China, Hefei, China
| |
Collapse
|
29
|
Bucholc M, James C, Al Khleifat A, Badhwar A, Clarke N, Dehsarvi A, Madan CR, Marzi SJ, Shand C, Schilder BM, Tamburin S, Tantiangco HM, Lourida I, Llewellyn DJ, Ranson JM. Artificial Intelligence for Dementia Research Methods Optimization. ARXIV 2023:arXiv:2303.01949v1. [PMID: 36911275 PMCID: PMC10002770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
INTRODUCTION Machine learning (ML) has been extremely successful in identifying key features from high-dimensional datasets and executing complicated tasks with human expert levels of accuracy or greater. METHODS We summarize and critically evaluate current applications of ML in dementia research and highlight directions for future research. RESULTS We present an overview of ML algorithms most frequently used in dementia research and highlight future opportunities for the use of ML in clinical practice, experimental medicine, and clinical trials. We discuss issues of reproducibility, replicability and interpretability and how these impact the clinical applicability of dementia research. Finally, we give examples of how state-of-the-art methods, such as transfer learning, multi-task learning, and reinforcement learning, may be applied to overcome these issues and aid the translation of research to clinical practice in the future. DISCUSSION ML-based models hold great promise to advance our understanding of the underlying causes and pathological mechanisms of dementia.
Collapse
Affiliation(s)
- Magda Bucholc
- Cognitive Analytics Research Lab, School of Computing, Engineering & Intelligent Systems, Ulster University, Derry, UK
| | - Charlotte James
- NIHR Bristol Biomedical Research Centre, University Hospitals Bristol and Weston NHS Foundation Trust and University of Bristol, Bristol, UK
| | - Ahmad Al Khleifat
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, United Kingdom
| | - AmanPreet Badhwar
- Multiomics Investigation of Neurodegenerative Diseases (MIND) Lab, Centre de Recherche de l’Institut Universitaire de Gériatrie de Montréal, Montréal, Canada
- Institut de génie biomédical, Université de Montréal, Montréal, Canada
- Département de Pharmacologie et Physiologie, Université de Montréal, Montréal, Canada
| | - Natasha Clarke
- Multiomics Investigation of Neurodegenerative Diseases (MIND) Lab, Centre de Recherche de l’Institut Universitaire de Gériatrie de Montréal, Montréal, Canada
| | - Amir Dehsarvi
- Aberdeen Biomedical Imaging Centre, School of Medicine, Medical Sciences, and Nutrition, University of Aberdeen, Aberdeen, UK
| | | | - Sarah J. Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Cameron Shand
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
| | - Brian M. Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Stefano Tamburin
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
| | | | | | - David J. Llewellyn
- University of Exeter Medical School, Exeter, UK
- The Alan Turing Institute, London, UK
| | | |
Collapse
|
30
|
Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3378. [PMID: 36834073 PMCID: PMC9967747 DOI: 10.3390/ijerph20043378] [Citation(s) in RCA: 114] [Impact Index Per Article: 114.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 02/09/2023] [Accepted: 02/13/2023] [Indexed: 06/01/2023]
Abstract
The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.
Collapse
Affiliation(s)
- Takanobu Hirosawa
- Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan
| | | | | | | | | | | |
Collapse
|
31
|
Agbavor F, Liang H. Artificial Intelligence-Enabled End-To-End Detection and Assessment of Alzheimer's Disease Using Voice. Brain Sci 2022; 13:28. [PMID: 36672010 PMCID: PMC9856143 DOI: 10.3390/brainsci13010028] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/13/2022] [Accepted: 12/20/2022] [Indexed: 12/25/2022] Open
Abstract
There is currently no simple, widely available screening method for Alzheimer's disease (AD), partly because the diagnosis of AD is complex and typically involves expensive and sometimes invasive tests not commonly available outside highly specialized clinical settings. Here, we developed an artificial intelligence (AI)-powered end-to-end system to detect AD and predict its severity directly from voice recordings. At the core of our system is the pre-trained data2vec model, the first high-performance self-supervised algorithm that works for speech, vision, and text. Our model was internally evaluated on the ADReSSo (Alzheimer's Dementia Recognition through Spontaneous Speech only) dataset containing voice recordings of subjects describing the Cookie Theft picture, and externally validated on a test dataset from DementiaBank. The AI model can detect AD with average area under the curve (AUC) of 0.846 and 0.835 on held-out and external test set, respectively. The model was well-calibrated (Hosmer-Lemeshow goodness-of-fit p-value = 0.9616). Moreover, the model can reliably predict the subject's cognitive testing score solely based on raw voice recordings. Our study demonstrates the feasibility of using the AI-powered end-to-end model for early AD diagnosis and severity prediction directly based on voice, showing its potential for screening Alzheimer's disease in a community setting.
Collapse
Affiliation(s)
| | - Hualou Liang
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA 19104, USA
| |
Collapse
|