1
|
Klempíř O, Krupička R. Analyzing Wav2Vec 1.0 Embeddings for Cross-Database Parkinson's Disease Detection and Speech Features Extraction. SENSORS (BASEL, SWITZERLAND) 2024; 24:5520. [PMID: 39275431 PMCID: PMC11398018 DOI: 10.3390/s24175520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 08/22/2024] [Accepted: 08/24/2024] [Indexed: 09/16/2024]
Abstract
Advancements in deep learning speech representations have facilitated the effective use of extensive unlabeled speech datasets for Parkinson's disease (PD) modeling with minimal annotated data. This study employs the non-fine-tuned wav2vec 1.0 architecture to develop machine learning models for PD speech diagnosis tasks, such as cross-database classification and regression to predict demographic and articulation characteristics. The primary aim is to analyze overlapping components within the embeddings on both classification and regression tasks, investigating whether latent speech representations in PD are shared across models, particularly for related tasks. Firstly, evaluation using three multi-language PD datasets showed that wav2vec accurately detected PD based on speech, outperforming feature extraction using mel-frequency cepstral coefficients in the proposed cross-database classification scenarios. In cross-database scenarios using Italian and English-read texts, wav2vec demonstrated performance comparable to intra-dataset evaluations. We also compared our cross-database findings against those of other related studies. Secondly, wav2vec proved effective in regression, modeling various quantitative speech characteristics related to articulation and aging. Ultimately, subsequent analysis of important features examined the presence of significant overlaps between classification and regression models. The feature importance experiments discovered shared features across trained models, with increased sharing for related tasks, further suggesting that wav2vec contributes to improved generalizability. The study proposes wav2vec embeddings as a next promising step toward a speech-based universal model to assist in the evaluation of PD.
Collapse
Affiliation(s)
| | - Radim Krupička
- Department of Biomedical Informatics, Faculty of Biomedical Engineering, Czech Technical University in Prague, 16000 Prague, Czech Republic;
| |
Collapse
|
2
|
Zhang X, Zhang X, Chen W, Li C, Yu C. Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments. Sci Rep 2024; 14:9543. [PMID: 38664511 PMCID: PMC11045867 DOI: 10.1038/s41598-024-60278-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 04/21/2024] [Indexed: 04/28/2024] Open
Abstract
Depression, a pervasive global mental disorder, profoundly impacts daily lives. Despite numerous deep learning studies focused on depression detection through speech analysis, the shortage of annotated bulk samples hampers the development of effective models. In response to this challenge, our research introduces a transfer learning approach for detecting depression in speech, aiming to overcome constraints imposed by limited resources. In the context of feature representation, we obtain depression-related features by fine-tuning wav2vec 2.0. By integrating 1D-CNN and attention pooling structures, we generate advanced features at the segment level, thereby enhancing the model's capability to capture temporal relationships within audio frames. In the realm of prediction results, we integrate LSTM and self-attention mechanisms. This incorporation assigns greater weights to segments associated with depression, thereby augmenting the model's discernment of depression-related information. The experimental results indicate that our model has achieved impressive F1 scores, reaching 79% on the DAIC-WOZ dataset and 90.53% on the CMDC dataset. It outperforms recent baseline models in the field of speech-based depression detection. This provides a promising solution for effective depression detection in low-resource environments.
Collapse
Affiliation(s)
- Xu Zhang
- School of Software Engineering, Xiamen University of Technology, Xiamen, 361024, China
| | - Xiangcheng Zhang
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, China.
| | - Weisi Chen
- School of Software Engineering, Xiamen University of Technology, Xiamen, 361024, China
| | - Chenlong Li
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, 361024, China
| | - Chengyuan Yu
- School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 330045, China
| |
Collapse
|
3
|
Palmirotta C, Aresta S, Battista P, Tagliente S, Lagravinese G, Mongelli D, Gelao C, Fiore P, Castiglioni I, Minafra B, Salvatore C. Unveiling the Diagnostic Potential of Linguistic Markers in Identifying Individuals with Parkinson's Disease through Artificial Intelligence: A Systematic Review. Brain Sci 2024; 14:137. [PMID: 38391712 PMCID: PMC10886733 DOI: 10.3390/brainsci14020137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 01/22/2024] [Accepted: 01/25/2024] [Indexed: 02/24/2024] Open
Abstract
While extensive research has documented the cognitive changes associated with Parkinson's disease (PD), a relatively small portion of the empirical literature investigated the language abilities of individuals with PD. Recently, artificial intelligence applied to linguistic data has shown promising results in predicting the clinical diagnosis of neurodegenerative disorders, but a deeper investigation of the current literature available on PD is lacking. This systematic review investigates the nature of language disorders in PD by assessing the contribution of machine learning (ML) to the classification of patients with PD. A total of 10 studies published between 2016 and 2023 were included in this review. Tasks used to elicit language were mainly structured or unstructured narrative discourse. Transcriptions were mostly analyzed using Natural Language Processing (NLP) techniques. The classification accuracy (%) ranged from 43 to 94, sensitivity (%) ranged from 8 to 95, specificity (%) ranged from 3 to 100, AUC (%) ranged from 32 to 97. The most frequent optimal linguistic measures were lexico-semantic (40%), followed by NLP-extracted features (26%) and morphological consistency features (20%). Artificial intelligence applied to linguistic markers provides valuable insights into PD. However, analyzing measures derived from narrative discourse can be time-consuming, and utilizing ML requires specialized expertise. Moving forward, it is important to focus on facilitating the integration of both narrative discourse analysis and artificial intelligence into clinical practice.
Collapse
Affiliation(s)
- Cinzia Palmirotta
- Istituti Clinici Scientifici Maugeri IRCCS, Laboratory of Neuropsychology, Bari Institute, 70124 Bari, Italy
| | - Simona Aresta
- Istituti Clinici Scientifici Maugeri IRCCS, Laboratory of Neuropsychology, Bari Institute, 70124 Bari, Italy
| | - Petronilla Battista
- Istituti Clinici Scientifici Maugeri IRCCS, Laboratory of Neuropsychology, Bari Institute, 70124 Bari, Italy
| | - Serena Tagliente
- Istituti Clinici Scientifici Maugeri IRCCS, Laboratory of Neuropsychology, Bari Institute, 70124 Bari, Italy
| | - Gianvito Lagravinese
- Istituti Clinici Scientifici Maugeri IRCCS, Laboratory of Neuropsychology, Bari Institute, 70124 Bari, Italy
| | - Davide Mongelli
- Istituti Clinici Scientifici Maugeri IRCCS, Laboratory of Neuropsychology, Bari Institute, 70124 Bari, Italy
| | - Christian Gelao
- Istituti Clinici Scientifici Maugeri IRCCS, Neurorehabilitation Unit of Bari Institute, 70124 Bari, Italy
| | - Pietro Fiore
- Istituti Clinici Scientifici Maugeri IRCCS, Neurorehabilitation Unit of Bari Institute, 70124 Bari, Italy
- Department of Physical and Rehabilitation Medicine, University of Foggia, 71122 Foggia, Italy
| | - Isabella Castiglioni
- Department of Physics G. Occhialini, University of Milan-Bicocca, 20133 Milan, Italy
| | - Brigida Minafra
- Istituti Clinici Scientifici Maugeri IRCCS, Neurorehabilitation Unit of Bari Institute, 70124 Bari, Italy
| | - Christian Salvatore
- Department of Science, Technology and Society, University School for Advanced Studies IUSS Pavia, 27100 Pavia, Italy
- DeepTrace Technologies S.R.L., 20122 Milan, Italy
| |
Collapse
|
4
|
Ibarra EJ, Arias-Londoño JD, Zañartu M, Godino-Llorente JI. Towards a Corpus (and Language)-Independent Screening of Parkinson's Disease from Voice and Speech through Domain Adaptation. Bioengineering (Basel) 2023; 10:1316. [PMID: 38002440 PMCID: PMC10669342 DOI: 10.3390/bioengineering10111316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/03/2023] [Accepted: 11/10/2023] [Indexed: 11/26/2023] Open
Abstract
End-to-end deep learning models have shown promising results for the automatic screening of Parkinson's disease by voice and speech. However, these models often suffer degradation in their performance when applied to scenarios involving multiple corpora. In addition, they also show corpus-dependent clusterings. These facts indicate a lack of generalisation or the presence of certain shortcuts in the decision, and also suggest the need for developing new corpus-independent models. In this respect, this work explores the use of domain adversarial training as a viable strategy to develop models that retain their discriminative capacity to detect Parkinson's disease across diverse datasets. The paper presents three deep learning architectures and their domain adversarial counterparts. The models were evaluated with sustained vowels and diadochokinetic recordings extracted from four corpora with different demographics, dialects or languages, and recording conditions. The results showed that the space distribution of the embedding features extracted by the domain adversarial networks exhibits a higher intra-class cohesion. This behaviour is supported by a decrease in the variability and inter-domain divergence computed within each class. The findings suggest that domain adversarial networks are able to learn the common characteristics present in Parkinsonian voice and speech, which are supposed to be corpus, and consequently, language independent. Overall, this effort provides evidence that domain adaptation techniques refine the existing end-to-end deep learning approaches for Parkinson's disease detection from voice and speech, achieving more generalizable models.
Collapse
Affiliation(s)
- Emiro J. Ibarra
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Avenida España 1680, Casilla 110-V, Valparaíso 2390123, Chile; (E.J.I.); (M.Z.)
| | - Julián D. Arias-Londoño
- Escuela Técnica Superior de Ingeneiros de Telecomunicación, Universidad Politécnica de Madrid, Avda, Ciudad Universitaria, 30, 28040 Madrid, Spain;
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Avenida España 1680, Casilla 110-V, Valparaíso 2390123, Chile; (E.J.I.); (M.Z.)
| | - Juan I. Godino-Llorente
- Escuela Técnica Superior de Ingeneiros de Telecomunicación, Universidad Politécnica de Madrid, Avda, Ciudad Universitaria, 30, 28040 Madrid, Spain;
| |
Collapse
|