1
|
Iqbal MS, Belal Bin Heyat M, Parveen S, Ammar Bin Hayat M, Roshanzamir M, Alizadehsani R, Akhtar F, Sayeed E, Hussain S, Hussein HS, Sawan M. Progress and trends in neurological disorders research based on deep learning. Comput Med Imaging Graph 2024; 116:102400. [PMID: 38851079 DOI: 10.1016/j.compmedimag.2024.102400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 05/07/2024] [Accepted: 05/13/2024] [Indexed: 06/10/2024]
Abstract
In recent years, deep learning (DL) has emerged as a powerful tool in clinical imaging, offering unprecedented opportunities for the diagnosis and treatment of neurological disorders (NDs). This comprehensive review explores the multifaceted role of DL techniques in leveraging vast datasets to advance our understanding of NDs and improve clinical outcomes. Beginning with a systematic literature review, we delve into the utilization of DL, particularly focusing on multimodal neuroimaging data analysis-a domain that has witnessed rapid progress and garnered significant scientific interest. Our study categorizes and critically analyses numerous DL models, including Convolutional Neural Networks (CNNs), LSTM-CNN, GAN, and VGG, to understand their performance across different types of Neurology Diseases. Through particular analysis, we identify key benchmarks and datasets utilized in training and testing DL models, shedding light on the challenges and opportunities in clinical neuroimaging research. Moreover, we discuss the effectiveness of DL in real-world clinical scenarios, emphasizing its potential to revolutionize ND diagnosis and therapy. By synthesizing existing literature and describing future directions, this review not only provides insights into the current state of DL applications in ND analysis but also covers the way for the development of more efficient and accessible DL techniques. Finally, our findings underscore the transformative impact of DL in reshaping the landscape of clinical neuroimaging, offering hope for enhanced patient care and groundbreaking discoveries in the field of neurology. This review paper is beneficial for neuropathologists and new researchers in this field.
Collapse
Affiliation(s)
- Muhammad Shahid Iqbal
- Department of Computer Science and Information Technology, Women University of Azad Jammu & Kashmir, Bagh, Pakistan.
| | - Md Belal Bin Heyat
- CenBRAIN Neurotech Center of Excellence, School of Engineering, Westlake University, Hangzhou, Zhejiang, China.
| | - Saba Parveen
- College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China.
| | | | - Mohamad Roshanzamir
- Department of Computer Engineering, Faculty of Engineering, Fasa University, Fasa, Iran.
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation, Deakin University, VIC 3216, Australia.
| | - Faijan Akhtar
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China.
| | - Eram Sayeed
- Kisan Inter College, Dhaurahara, Kushinagar, India.
| | - Sadiq Hussain
- Department of Examination, Dibrugarh University, Assam 786004, India.
| | - Hany S Hussein
- Electrical Engineering Department, Faculty of Engineering, King Khalid University, Abha 61411, Saudi Arabia; Electrical Engineering Department, Faculty of Engineering, Aswan University, Aswan 81528, Egypt.
| | - Mohamad Sawan
- CenBRAIN Neurotech Center of Excellence, School of Engineering, Westlake University, Hangzhou, Zhejiang, China.
| |
Collapse
|
2
|
Jeong SM, Kim S, Lee EC, Kim HJ. Exploring Spectrogram-Based Audio Classification for Parkinson's Disease: A Study on Speech Classification and Qualitative Reliability Verification. SENSORS (BASEL, SWITZERLAND) 2024; 24:4625. [PMID: 39066023 PMCID: PMC11280556 DOI: 10.3390/s24144625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 07/15/2024] [Accepted: 07/16/2024] [Indexed: 07/28/2024]
Abstract
Patients suffering from Parkinson's disease suffer from voice impairment. In this study, we introduce models to classify normal and Parkinson's patients using their speech. We used an AST (audio spectrogram transformer), a transformer-based speech classification model that has recently outperformed CNN-based models in many fields, and a CNN-based PSLA (pretraining, sampling, labeling, and aggregation), a high-performance model in the existing speech classification field, for the study. This study compares and analyzes the models from both quantitative and qualitative perspectives. First, qualitatively, PSLA outperformed AST by more than 4% in accuracy, and the AUC was also higher, with 94.16% for AST and 97.43% for PSLA. Furthermore, we qualitatively evaluated the ability of the models to capture the acoustic features of Parkinson's through various CAM (class activation map)-based XAI (eXplainable AI) models such as GradCAM and EigenCAM. Based on PSLA, we found that the model focuses well on the muffled frequency band of Parkinson's speech, and the heatmap analysis of false positives and false negatives shows that the speech features are also visually represented when the model actually makes incorrect predictions. The contribution of this paper is that we not only found a suitable model for diagnosing Parkinson's through speech using two different types of models but also validated the predictions of the model in practice.
Collapse
Affiliation(s)
- Seung-Min Jeong
- Department of AI & Informatics, Graduate School, Sangmyung University, Hongjimun 2-gil 20, Jongno-gu, Seoul 03016, Republic of Korea; (S.-M.J.); (S.K.)
| | - Seunghyun Kim
- Department of AI & Informatics, Graduate School, Sangmyung University, Hongjimun 2-gil 20, Jongno-gu, Seoul 03016, Republic of Korea; (S.-M.J.); (S.K.)
| | - Eui Chul Lee
- Department of Human-Centered Artificial Intelligence, Sangmyung University, Hongjimun 2-gil 20, Jongno-gu, Seoul 03016, Republic of Korea
| | - Han Joon Kim
- Department of Neurology, Seoul National University College of Medicine, Seoul National University Hospital, Daehak-ro 101, Jongno-gu, Seoul 03080, Republic of Korea
| |
Collapse
|
3
|
Pan X, Liang B, Cao T. A bibliometric analysis of speech and language impairments in Parkinson's disease based on Web of Science. Front Psychol 2024; 15:1374924. [PMID: 38962221 PMCID: PMC11220271 DOI: 10.3389/fpsyg.2024.1374924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 06/06/2024] [Indexed: 07/05/2024] Open
Abstract
Many individuals with Parkinson's disease suffer from speech and language impairments that significantly impact their quality of life. Despite several studies on these disorders, there is a lack of relevant bibliometric analyses. This paper conducted a bibliometric analysis of 3,610 papers on speech and language impairments in Parkinson's disease patients from January 1961 to November 2023, based on the Web of Science Core Collection database. Using Citespace software, the analysis focused on annual publication volume, cooperation among countries and institutions, author collaborations, journals, co-citation references, and keywords, aiming to explore the current research status, hotspots, and frontiers in this field. The number of annual publications related to speech and language impairment in Parkinson's disease have been increasing over the years. The USA leads in the number of publications. Research hotspots include the mechanism underlying speech and language impairments, clinical symptoms, automated diagnosis and classification of patients with PD using linguistic makers, and rehabilitation interventions.
Collapse
Affiliation(s)
- Xueyao Pan
- School of Foreign Languages and Literatures, Chongqing Normal University, Chongqing, China
| | - Bingqian Liang
- School of Foreign Studies, Anhui Xinhua University, Hefei, Anhui, China
| | - Ting Cao
- School of Foreign Languages and Literatures, Chongqing Normal University, Chongqing, China
| |
Collapse
|
4
|
Malekroodi HS, Madusanka N, Lee BI, Yi M. Leveraging Deep Learning for Fine-Grained Categorization of Parkinson's Disease Progression Levels through Analysis of Vocal Acoustic Patterns. Bioengineering (Basel) 2024; 11:295. [PMID: 38534569 DOI: 10.3390/bioengineering11030295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 03/18/2024] [Accepted: 03/18/2024] [Indexed: 03/28/2024] Open
Abstract
Speech impairments often emerge as one of the primary indicators of Parkinson's disease (PD), albeit not readily apparent in its early stages. While previous studies focused predominantly on binary PD detection, this research explored the use of deep learning models to automatically classify sustained vowel recordings into healthy controls, mild PD, or severe PD based on motor symptom severity scores. Popular convolutional neural network (CNN) architectures, VGG and ResNet, as well as vision transformers, Swin, were fine-tuned on log mel spectrogram image representations of the segmented voice data. Furthermore, the research investigated the effects of audio segment lengths and specific vowel sounds on the performance of these models. The findings indicated that implementing longer segments yielded better performance. The models showed strong capability in distinguishing PD from healthy subjects, achieving over 95% precision. However, reliably discriminating between mild and severe PD cases remained challenging. The VGG16 achieved the best overall classification performance with 91.8% accuracy and the largest area under the ROC curve. Furthermore, focusing analysis on the vowel /u/ could further improve accuracy to 96%. Applying visualization techniques like Grad-CAM also highlighted how CNN models focused on localized spectrogram regions while transformers attended to more widespread patterns. Overall, this work showed the potential of deep learning for non-invasive screening and monitoring of PD progression from voice recordings, but larger multi-class labeled datasets are needed to further improve severity classification.
Collapse
Affiliation(s)
- Hadi Sedigh Malekroodi
- Industry 4.0 Convergence Bionics Engineering, Pukyong National University, Busan 48513, Republic of Korea
| | - Nuwan Madusanka
- Digital of Healthcare Research Center, Institute of Information Technology and Convergence, Pukyong National University, Busan 48513, Republic of Korea
| | - Byeong-Il Lee
- Industry 4.0 Convergence Bionics Engineering, Pukyong National University, Busan 48513, Republic of Korea
- Digital of Healthcare Research Center, Institute of Information Technology and Convergence, Pukyong National University, Busan 48513, Republic of Korea
- Division of Smart Healthcare, Pukyong National University, Busan 48513, Republic of Korea
| | - Myunggi Yi
- Industry 4.0 Convergence Bionics Engineering, Pukyong National University, Busan 48513, Republic of Korea
- Digital of Healthcare Research Center, Institute of Information Technology and Convergence, Pukyong National University, Busan 48513, Republic of Korea
- Division of Smart Healthcare, Pukyong National University, Busan 48513, Republic of Korea
| |
Collapse
|
5
|
Ibarra EJ, Arias-Londoño JD, Zañartu M, Godino-Llorente JI. Towards a Corpus (and Language)-Independent Screening of Parkinson's Disease from Voice and Speech through Domain Adaptation. Bioengineering (Basel) 2023; 10:1316. [PMID: 38002440 PMCID: PMC10669342 DOI: 10.3390/bioengineering10111316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/03/2023] [Accepted: 11/10/2023] [Indexed: 11/26/2023] Open
Abstract
End-to-end deep learning models have shown promising results for the automatic screening of Parkinson's disease by voice and speech. However, these models often suffer degradation in their performance when applied to scenarios involving multiple corpora. In addition, they also show corpus-dependent clusterings. These facts indicate a lack of generalisation or the presence of certain shortcuts in the decision, and also suggest the need for developing new corpus-independent models. In this respect, this work explores the use of domain adversarial training as a viable strategy to develop models that retain their discriminative capacity to detect Parkinson's disease across diverse datasets. The paper presents three deep learning architectures and their domain adversarial counterparts. The models were evaluated with sustained vowels and diadochokinetic recordings extracted from four corpora with different demographics, dialects or languages, and recording conditions. The results showed that the space distribution of the embedding features extracted by the domain adversarial networks exhibits a higher intra-class cohesion. This behaviour is supported by a decrease in the variability and inter-domain divergence computed within each class. The findings suggest that domain adversarial networks are able to learn the common characteristics present in Parkinsonian voice and speech, which are supposed to be corpus, and consequently, language independent. Overall, this effort provides evidence that domain adaptation techniques refine the existing end-to-end deep learning approaches for Parkinson's disease detection from voice and speech, achieving more generalizable models.
Collapse
Affiliation(s)
- Emiro J. Ibarra
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Avenida España 1680, Casilla 110-V, Valparaíso 2390123, Chile; (E.J.I.); (M.Z.)
| | - Julián D. Arias-Londoño
- Escuela Técnica Superior de Ingeneiros de Telecomunicación, Universidad Politécnica de Madrid, Avda, Ciudad Universitaria, 30, 28040 Madrid, Spain;
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Avenida España 1680, Casilla 110-V, Valparaíso 2390123, Chile; (E.J.I.); (M.Z.)
| | - Juan I. Godino-Llorente
- Escuela Técnica Superior de Ingeneiros de Telecomunicación, Universidad Politécnica de Madrid, Avda, Ciudad Universitaria, 30, 28040 Madrid, Spain;
| |
Collapse
|
6
|
Warule P, Mishra SP, Deb S. Time-frequency analysis of speech signal using Chirplet transform for automatic diagnosis of Parkinson's disease. Biomed Eng Lett 2023; 13:613-623. [PMID: 37872998 PMCID: PMC10590362 DOI: 10.1007/s13534-023-00283-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 04/22/2023] [Accepted: 04/25/2023] [Indexed: 10/25/2023] Open
Abstract
Parkinson's disease (PD) is the second most prevalent neurodegenerative disorder in the world after Alzheimer's disease. Early diagnosing PD is challenging as it evolved slowly, and its symptoms eventuate gradually. Recent studies have demonstrated that changes in speech may be utilized as an excellent biomarker for the early diagnosis of PD. In this study, we have proposed a Chirplet transform (CT) based novel approach for diagnosing PD using speech signals. We employed CT to get the time-frequency matrix (TFM) of each speech recording, and we extracted time-frequency based entropy (TFE) features from the TFM. The statistical analysis demonstrates that the TFE features reflect the changes in speech that occurs in the speech due to PD, hence can be used for classifying the PD and healthy control (HC) individuals. The effectiveness of the proposed framework is validated using the vowels and words from the PC-GITA database. The genetic algorithm is utilized to select the optimum features subset, while a support vector machine (SVM), decision tree (DT), K-Nearest Neighbor (KNN), and Naïve Bayes (NB) classifiers are employed for classification. The TFE features outperform the breathiness and Mel frequency cepstral coefficients (MFCC) features. The SVM classifier is most effective compared to other machine-learning classifiers. The highest classification accuracy rates of 98% and 99% are achieved using the vowel /a/ and word /atleta/, respectively. The results reveal that the proposed CT-based entropy features effectively diagnose PD using the speech of a person.
Collapse
Affiliation(s)
- Pankaj Warule
- Department of Electronics Engineering, Sardar Vallabhbhai National Institute of Technology, Surat, India
| | - Siba Prasad Mishra
- Department of Electronics Engineering, Sardar Vallabhbhai National Institute of Technology, Surat, India
| | - Suman Deb
- Department of Electronics Engineering, Sardar Vallabhbhai National Institute of Technology, Surat, India
| |
Collapse
|
7
|
Hireš M, Drotár P, Pah ND, Ngo QC, Kumar DK. On the inter-dataset generalization of machine learning approaches to Parkinson's disease detection from voice. Int J Med Inform 2023; 179:105237. [PMID: 37801807 DOI: 10.1016/j.ijmedinf.2023.105237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/20/2023] [Accepted: 09/24/2023] [Indexed: 10/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Parkinson's disease is the second-most-common neurodegenerative disorder that affects motor skills, cognitive processes, mood, and everyday tasks such as speaking and walking. The voices of people with Parkinson's disease may become weak, breathy, or hoarse and may sound emotionless, with slurred words and mumbling. Algorithms for computerized voice analysis have been proposed and have shown highly accurate results. However, these algorithms were developed on single, limited datasets, with participants possessing similar demographics. Such models are prone to overfitting and are unsuitable for generalization, which is essential in real-world applications. METHODS We evaluated the computerized Parkinson's disease diagnosis performance of various machine learning models and showed that these models degraded rapidly when used on different datasets. We evaluated two mainstream state-of-the-art approaches, one based on deep convolutional neural networks and another based on voice feature extraction followed by a shallow classifier (i.e., extreme gradient boosting (XGBoost)). RESULTS An investigation with four datasets (CzechPD, PC-GITA, ITA, and RMIT-PD) proved that even if the algorithms yielded excellent performance on a single dataset, the results obtained on new data or even a mix of datasets were very unsatisfactory. CONCLUSIONS More work needs to be done to make computerized voice analysis methods for Parkinson's disease diagnosis suitable for real-world applications.
Collapse
Affiliation(s)
- Máté Hireš
- Intelligent Information Systems Lab, Technical University of Kosice, Letna 9, 42001 Kosice, Slovakia
| | - Peter Drotár
- Intelligent Information Systems Lab, Technical University of Kosice, Letna 9, 42001 Kosice, Slovakia.
| | - Nemuel Daniel Pah
- Biosignals Lab, RMIT University, Melbourne, Australia; Universitas Surabaya, Surabaya, Indonesia
| | | | | |
Collapse
|
8
|
Idrisoglu A, Dallora AL, Anderberg P, Berglund JS. Applied Machine Learning Techniques to Diagnose Voice-Affecting Conditions and Disorders: Systematic Literature Review. J Med Internet Res 2023; 25:e46105. [PMID: 37467031 PMCID: PMC10398366 DOI: 10.2196/46105] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 04/26/2023] [Accepted: 05/23/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND Normal voice production depends on the synchronized cooperation of multiple physiological systems, which makes the voice sensitive to changes. Any systematic, neurological, and aerodigestive distortion is prone to affect voice production through reduced cognitive, pulmonary, and muscular functionality. This sensitivity inspired using voice as a biomarker to examine disorders that affect the voice. Technological improvements and emerging machine learning (ML) technologies have enabled possibilities of extracting digital vocal features from the voice for automated diagnosis and monitoring systems. OBJECTIVE This study aims to summarize a comprehensive view of research on voice-affecting disorders that uses ML techniques for diagnosis and monitoring through voice samples where systematic conditions, nonlaryngeal aerodigestive disorders, and neurological disorders are specifically of interest. METHODS This systematic literature review (SLR) investigated the state of the art of voice-based diagnostic and monitoring systems with ML technologies, targeting voice-affecting disorders without direct relation to the voice box from the point of view of applied health technology. Through a comprehensive search string, studies published from 2012 to 2022 from the databases Scopus, PubMed, and Web of Science were scanned and collected for assessment. To minimize bias, retrieval of the relevant references in other studies in the field was ensured, and 2 authors assessed the collected studies. Low-quality studies were removed through a quality assessment and relevant data were extracted through summary tables for analysis. The articles were checked for similarities between author groups to prevent cumulative redundancy bias during the screening process, where only 1 article was included from the same author group. RESULTS In the analysis of the 145 included studies, support vector machines were the most utilized ML technique (51/145, 35.2%), with the most studied disease being Parkinson disease (PD; reported in 87/145, 60%, studies). After 2017, 16 additional voice-affecting disorders were examined, in contrast to the 3 investigated previously. Furthermore, an upsurge in the use of artificial neural network-based architectures was observed after 2017. Almost half of the included studies were published in last 2 years (2021 and 2022). A broad interest from many countries was observed. Notably, nearly one-half (n=75) of the studies relied on 10 distinct data sets, and 11/145 (7.6%) used demographic data as an input for ML models. CONCLUSIONS This SLR revealed considerable interest across multiple countries in using ML techniques for diagnosing and monitoring voice-affecting disorders, with PD being the most studied disorder. However, the review identified several gaps, including limited and unbalanced data set usage in studies, and a focus on diagnostic test rather than disorder-specific monitoring. Despite the limitations of being constrained by only peer-reviewed publications written in English, the SLR provides valuable insights into the current state of research on ML-based voice-affecting disorder diagnosis and monitoring and highlighting areas to address in future research.
Collapse
Affiliation(s)
- Alper Idrisoglu
- Department of Health, Blekinge Institute of Technology, Karslkrona, Sweden
| | - Ana Luiza Dallora
- Department of Health, Blekinge Institute of Technology, Karslkrona, Sweden
| | - Peter Anderberg
- Department of Health, Blekinge Institute of Technology, Karslkrona, Sweden
- School of Health Sciences, University of Skövde, Skövde, Sweden
| | | |
Collapse
|
9
|
Hemmerling D, Wodzinski M, Orozco-Arroyave JR, Sztaho D, Daniol M, Jemiolo P, Wojcik-Pedziwiatr M. Vision Transformer for Parkinson's Disease Classification using Multilingual Sustained Vowel Recordings. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083719 DOI: 10.1109/embc40787.2023.10340478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Parkinson's disease (PD) is the 2nd most prevalent neurodegenerative disease in the world. Thus, the early detection of PD has recently been the subject of several scientific and commercial studies. In this paper, we propose a pipeline using Vision Transformer applied to mel-spectrograms for PD classification using multilingual sustained vowel recordings. Furthermore, our proposed transformed-based model shows a great potential to use voice as a single modality biomarker for automatic PD detection without language restrictions, a wide range of vowels, with an F1-score equal to 0.78. The results of our study fall within the range of the estimated prevalence of voice and speech disorders in Parkinson's disease, which ranges from 70-90%. Our study demonstrates a high potential for adaptation in clinical decision-making, allowing for increasingly systematic and fast diagnosis of PD with the potential for use in telemedicine.Clinical relevance- There is an urgent need to develop non invasive biomarker of Parkinson's disease effective enough to detect the onset of the disease to introduce neuroprotective treatment at the earliest stage possible and to follow the results of that intervention. Voice disorders in PD are very frequent and are expected to be utilized as an early diagnostic biomarker. The voice analysis using deep neural networks open new opportunities to assess neurodegenerative diseases' symptoms, for fast diagnosis-making, to guide treatment initiation, and risk prediction. The detection accuracy for voice biomarkers according to our method reached close to the maximum achievable value.
Collapse
|
10
|
Escobar-Grisales D, Ríos-Urrego CD, Orozco-Arroyave JR. Deep Learning and Artificial Intelligence Applied to Model Speech and Language in Parkinson's Disease. Diagnostics (Basel) 2023; 13:2163. [PMID: 37443557 DOI: 10.3390/diagnostics13132163] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 06/16/2023] [Accepted: 06/18/2023] [Indexed: 07/15/2023] Open
Abstract
Parkinson's disease (PD) is the second most prevalent neurodegenerative disorder in the world, and it is characterized by the production of different motor and non-motor symptoms which negatively affect speech and language production. For decades, the research community has been working on methodologies to automatically model these biomarkers to detect and monitor the disease; however, although speech impairments have been widely explored, language remains underexplored despite being a valuable source of information, especially to assess cognitive impairments associated with non-motor symptoms. This study proposes the automatic assessment of PD patients using different methodologies to model speech and language biomarkers. One-dimensional and two-dimensional convolutional neural networks (CNNs), along with pre-trained models such as Wav2Vec 2.0, BERT, and BETO, were considered to classify PD patients vs. Healthy Control (HC) subjects. The first approach consisted of modeling speech and language independently. Then, the best representations from each modality were combined following early, joint, and late fusion strategies. The results show that the speech modality yielded an accuracy of up to 88%, thus outperforming all language representations, including the multi-modal approach. These results suggest that speech representations better discriminate PD patients and HC subjects than language representations. When analyzing the fusion strategies, we observed that changes in the time span of the multi-modal representation could produce a significant loss of information in the speech modality, which was likely linked to a decrease in accuracy in the multi-modal experiments. Further experiments are necessary to validate this claim with other fusion methods using different time spans.
Collapse
Affiliation(s)
| | | | - Juan Rafael Orozco-Arroyave
- GITA Lab, Faculty of Engineering, University of Antioquia, Medellín 050010, Colombia
- LME Lab, University of Erlangen, 91054 Erlangen, Germany
| |
Collapse
|
11
|
Liu Q, Ostinelli EG, De Crescenzo F, Li Z, Tomlinson A, Salanti G, Cipriani A, Efthimiou O. Predicting outcomes at the individual patient level: what is the best method? BMJ MENTAL HEALTH 2023; 26:e300701. [PMID: 37316257 PMCID: PMC10277128 DOI: 10.1136/bmjment-2023-300701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 04/26/2023] [Indexed: 06/16/2023]
Abstract
OBJECTIVE When developing prediction models, researchers commonly employ a single model which uses all the available data (end-to-end approach). Alternatively, a similarity-based approach has been previously proposed, in which patients with similar clinical characteristics are first grouped into clusters, then prediction models are developed within each cluster. The potential advantage of the similarity-based approach is that it may better address heterogeneity in patient characteristics. However, it remains unclear whether it improves the overall predictive performance. We illustrate the similarity-based approach using data from people with depression and empirically compare its performance with the end-to-end approach. METHODS We used primary care data collected in general practices in the UK. Using 31 predefined baseline variables, we aimed to predict the severity of depressive symptoms, measured by Patient Health Questionnaire-9, 60 days after initiation of antidepressant treatment. Following the similarity-based approach, we used k-means to cluster patients based on their baseline characteristics. We derived the optimal number of clusters using the Silhouette coefficient. We used ridge regression to build prediction models in both approaches. To compare the models' performance, we calculated the mean absolute error (MAE) and the coefficient of determination (R2) using bootstrapping. RESULTS We analysed data from 16 384 patients. The end-to-end approach resulted in an MAE of 4.64 and R2 of 0.20. The best-performing similarity-based model was for four clusters, with MAE of 4.65 and R2 of 0.19. CONCLUSIONS The end-to-end and the similarity-based model yielded comparable performance. Due to its simplicity, the end-to-end approach can be favoured when using demographic and clinical data to build prediction models on pharmacological treatments for depression.
Collapse
Affiliation(s)
- Qiang Liu
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
- Department of Engineering Mathematics, University of Bristol, Bristol, UK
| | - Edoardo Giuseppe Ostinelli
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
- Oxford Health NHS Foundation Trust, Warneford Hospital, Oxford, UK
| | - Franco De Crescenzo
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
| | - Zhenpeng Li
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
| | - Anneka Tomlinson
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
| | - Georgia Salanti
- Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
| | - Andrea Cipriani
- Department of Psychiatry, University of Oxford, Oxford, UK
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
- Oxford Health NHS Foundation Trust, Warneford Hospital, Oxford, UK
| | - Orestis Efthimiou
- Oxford Precision Psychiatry Lab, NIHR Oxford Health Biomedical Research Centre, Oxford, UK
- Institute of Social and Preventive Medicine (ISPM), University of Bern, Bern, Switzerland
- Institute of Primary Health Care (BIHAM), University of Bern, Bern, Switzerland
| |
Collapse
|
12
|
Luna-Ortiz I, Aldape-Pérez M, Uriarte-Arcia AV, Rodríguez-Molina A, Alarcón-Paredes A, Ventura-Molina E. Parkinson's Disease Detection from Voice Recordings Using Associative Memories. Healthcare (Basel) 2023; 11:healthcare11111601. [PMID: 37297740 DOI: 10.3390/healthcare11111601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 05/21/2023] [Accepted: 05/26/2023] [Indexed: 06/12/2023] Open
Abstract
Parkinson's disease (PD) is a neurological condition that is chronic and worsens over time, which presents a challenging diagnosis. An accurate diagnosis is required to recognize PD patients from healthy individuals. Diagnosing PD at early stages can reduce the severity of this disorder and improve the patient's living conditions. Algorithms based on associative memory (AM) have been applied in PD diagnosis using voice samples of patients with this health condition. Even though AM models have achieved competitive results in PD classification, they do not have any embedded component in the AM model that can identify and remove irrelevant features, which would consequently improve the classification performance. In this paper, we present an improvement to the smallest normalized difference associative memory (SNDAM) algorithm by means of a learning reinforcement phase that improves classification performance of SNDAM when it is applied to PD diagnosis. For the experimental phase, two datasets that have been widely applied for PD diagnosis were used. Both datasets were gathered from voice samples from healthy people and from patients who suffer from this condition at an early stage of PD. These datasets are publicly accessible in the UCI Machine Learning Repository. The efficiency of the ISNDAM model was contrasted with that of seventy other models implemented in the WEKA workbench and was compared to the performance of previous studies. A statistical significance analysis was performed to verify that the performance differences between the compared models were statistically significant. The experimental findings allow us to affirm that the proposed improvement in the SNDAM algorithm, called ISNDAM, effectively increases the classification performance compared against well-known algorithms. ISNDAM achieves a classification accuracy of 99.48%, followed by ANN Levenberg-Marquardt with 95.89% and SVM RBF kernel with 88.21%, using Dataset 1. ISNDAM achieves a classification accuracy of 99.66%, followed by SVM IMF1 with 96.54% and RF IMF1 with 94.89%, using Dataset 2. The experimental findings show that ISNDAM achieves competitive performance on both datasets and that statistical significance tests confirm that ISNDAM delivers classification performance equivalent to that of models published in previous studies.
Collapse
Affiliation(s)
- Irving Luna-Ortiz
- Instituto Politécnico Nacional, Center for Computing Innovation and Technological Development (CIDETEC), Computational Intelligence Laboratory (CIL), Mexico City 07700, Mexico
| | - Mario Aldape-Pérez
- Instituto Politécnico Nacional, Center for Computing Innovation and Technological Development (CIDETEC), Computational Intelligence Laboratory (CIL), Mexico City 07700, Mexico
| | - Abril Valeria Uriarte-Arcia
- Instituto Politécnico Nacional, Center for Computing Innovation and Technological Development (CIDETEC), Computational Intelligence Laboratory (CIL), Mexico City 07700, Mexico
| | - Alejandro Rodríguez-Molina
- Tecnológico Nacional de México/IT de Tlalnepantla, Research and Postgraduate Division, Tlalnepantla de Baz 54070, Mexico
| | - Antonio Alarcón-Paredes
- Instituto Politécnico Nacional, Center for Computing Research (CIC), Computational Intelligence Laboratory (CIL), Mexico City 07700, Mexico
| | - Elías Ventura-Molina
- Instituto Politécnico Nacional, Center for Computing Innovation and Technological Development (CIDETEC), Computational Intelligence Laboratory (CIL), Mexico City 07700, Mexico
| |
Collapse
|
13
|
Costantini G, Cesarini V, Di Leo P, Amato F, Suppa A, Asci F, Pisani A, Calculli A, Saggio G. Artificial Intelligence-Based Voice Assessment of Patients with Parkinson's Disease Off and On Treatment: Machine vs. Deep-Learning Comparison. SENSORS (BASEL, SWITZERLAND) 2023; 23:2293. [PMID: 36850893 PMCID: PMC9962335 DOI: 10.3390/s23042293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/13/2023] [Accepted: 02/16/2023] [Indexed: 06/18/2023]
Abstract
Parkinson's Disease (PD) is one of the most common non-curable neurodegenerative diseases. Diagnosis is achieved clinically on the basis of different symptoms with considerable delays from the onset of neurodegenerative processes in the central nervous system. In this study, we investigated early and full-blown PD patients based on the analysis of their voice characteristics with the aid of the most commonly employed machine learning (ML) techniques. A custom dataset was made with hi-fi quality recordings of vocal tasks gathered from Italian healthy control subjects and PD patients, divided into early diagnosed, off-medication patients on the one hand, and mid-advanced patients treated with L-Dopa on the other. Following the current state-of-the-art, several ML pipelines were compared usingdifferent feature selection and classification algorithms, and deep learning was also explored with a custom CNN architecture. Results show how feature-based ML and deep learning achieve comparable results in terms of classification, with KNN, SVM and naïve Bayes classifiers performing similarly, with a slight edge for KNN. Much more evident is the predominance of CFS as the best feature selector. The selected features act as relevant vocal biomarkers capable of differentiating healthy subjects, early untreated PD patients and mid-advanced L-Dopa treated patients.
Collapse
Affiliation(s)
- Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Valerio Cesarini
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Pietro Di Leo
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Federica Amato
- Department of Control and Computer Engineering, Polytechnic University of Turin, 10129 Turin, Italy
| | - Antonio Suppa
- Department of Human Neurosciences, Sapienza University of Rome, 00185 Rome, Italy
- IRCCS Neuromed Institute, 86077 Pozzilli, Italy
| | - Francesco Asci
- Department of Human Neurosciences, Sapienza University of Rome, 00185 Rome, Italy
- IRCCS Neuromed Institute, 86077 Pozzilli, Italy
| | - Antonio Pisani
- Department of Brain and Behavioral Sciences, University of Pavia, 27100 Pavia, Italy
- IRCCS Mondino Foundation, 27100 Pavia, Italy
| | - Alessandra Calculli
- Department of Brain and Behavioral Sciences, University of Pavia, 27100 Pavia, Italy
- IRCCS Mondino Foundation, 27100 Pavia, Italy
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| |
Collapse
|
14
|
Meng W, Zhang Q, Ma S, Cai M, Liu D, Liu Z, Yang J. A lightweight CNN and Transformer hybrid model for mental retardation screening among children from spontaneous speech. Comput Biol Med 2022; 151:106281. [PMID: 36399858 DOI: 10.1016/j.compbiomed.2022.106281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 10/17/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022]
Abstract
Mental retardation (MR) is a group of mental disorders characterized by low intelligence and social adjustment difficulties. Early diagnosis is beneficial for the timely intervention of children with MR to ease the degree of disability. Children with MR always have impaired speech functions compared to normal children, which is significant for clinical diagnosis. On the basis of this, our study proposes a spontaneous speech-based framework (MT-Net) for screening MR, which merges mobile inverted bottleneck convolutional blocks (MBConv) and visual Transformer blocks. MT-Net takes log-mel spectrograms converted from raw interview speech as data source, and utilizes MBConv and visual Transformer to learn low-level and high-level features well. In addition, SpecAugment, a data augmentation strategy, has been used to expand our audio dataset to further enhance the performance of MT-Net. The experimental results show that our proposed MT-Net outperforms Transformer networks (ViT) and convolutional neural networks (ResNet18, MobileNetV2, EfficientNetV2), achieving accuracy of 91.60% after using SpecAugment. Our proposed MT-Net has fewer parameters, low computing consumption and high prediction accuracy, which is expected to be an auxiliary screening tool for MR.
Collapse
Affiliation(s)
- Wei Meng
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
| | - Qianhong Zhang
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
| | - Simeng Ma
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Mincheng Cai
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
| | - Dujuan Liu
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Zhongchun Liu
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China.
| | - Jun Yang
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China.
| |
Collapse
|