1
|
Favaro A, Tsai YT, Butala A, Thebaud T, Villalba J, Dehak N, Moro-Velázquez L. Interpretable speech features vs. DNN embeddings: What to use in the automatic assessment of Parkinson's disease in multi-lingual scenarios. Comput Biol Med 2023; 166:107559. [PMID: 37852107 DOI: 10.1016/j.compbiomed.2023.107559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 10/07/2023] [Accepted: 10/09/2023] [Indexed: 10/20/2023]
Abstract
Speech-based approaches for assessing Parkinson's Disease (PD) often rely on feature extraction for automatic classification or detection. While many studies prioritize accuracy by using non-interpretable embeddings from Deep Neural Networks, this work aims to explore the predictive capabilities and language robustness of both feature types in a systematic fashion. As interpretable features, prosodic, linguistic, and cognitive descriptors were adopted, while x-vectors, Wav2Vec 2.0, HuBERT, and TRILLsson representations were used as non-interpretable features. Mono-lingual, multi-lingual, and cross-lingual machine learning experiments were conducted leveraging six data sets comprising speech recordings from various languages: American English, Castilian Spanish, Colombian Spanish, Italian, German, and Czech. For interpretable feature-based models, the mean of the best F1-scores obtained from each language was 81% in mono-lingual, 81% in multi-lingual, and 71% in cross-lingual experiments. For non-interpretable feature-based models, instead, they were 85% in mono-lingual, 88% in multi-lingual, and 79% in cross-lingual experiments. Firstly, models based on non-interpretable features outperformed interpretable ones, especially in cross-lingual experiments. Specifically, TRILLsson provided the most stable and accurate results across tasks and data sets. Conversely, the two types of features adopted showed some level of language robustness in multi-lingual and cross-lingual experiments. Overall, these results suggest that interpretable feature-based models can be used by clinicians to evaluate the deterioration of the speech of patients with PD, while non-interpretable feature-based models can be leveraged to achieve higher detection accuracy.
Collapse
Affiliation(s)
- Anna Favaro
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America.
| | - Yi-Ting Tsai
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| | - Ankur Butala
- Department of Neurology, The Johns Hopkins University, Baltimore, 21218, MD, United States of America; Department of Psychiatry and Behavioral Sciences, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| | - Thomas Thebaud
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| | - Jesús Villalba
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| | - Najim Dehak
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| | - Laureano Moro-Velázquez
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| |
Collapse
|
2
|
Wang Y, Zhou Y, Mei Y. A joint attention enhancement network for text classification applied to citizen complaint reporting. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04490-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]
|
3
|
Minutolo A, Guarasci R, Damiano E, De Pietro G, Fujita H, Esposito M. A multi-level methodology for the automated translation of a coreference resolution dataset: an application to the Italian language. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07641-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractIn the last decade, the demand for readily accessible corpora has touched all areas of natural language processing, including coreference resolution. However, it is one of the least considered sub-fields in recent developments. Moreover, almost all existing resources are only available for the English language. To overcome this lack, this work proposes a methodology to create a corpus for coreference resolution in Italian exploiting knowledge of annotated resources in other languages. Starting from OntonNotes, the methodology translates and refines English utterances to obtain utterances respecting Italian grammar, dealing with language-specific phenomena and preserving coreference and mentions. A quantitative and qualitative evaluation is performed to assess the well-formedness of generated utterances, considering readability, grammaticality, and acceptability indexes. The results have confirmed the effectiveness of the methodology in generating a good dataset for coreference resolution starting from an existing one. The goodness of the dataset is also assessed by training a coreference resolution model based on BERT language model, achieving the promising results. Even if the methodology has been tailored for English and Italian languages, it has a general basis easily extendable to other languages, adapting a small number of language-dependent rules to generalize most of the linguistic phenomena of the language under examination.
Collapse
|
4
|
Huang P, Zhao J, Sun S, Lin Y. Knowledge enhanced zero-resource machine translation using image-pivoting. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03997-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
5
|
An event-based opinion summarization model for long chinese text with sentiment awareness and parameter fusion mechanism. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03231-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
6
|
Deep Learning Models for Fast Retrieval and Extraction of French Speech Vocabulary Applications. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4286659. [PMID: 35845913 PMCID: PMC9287002 DOI: 10.1155/2022/4286659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 06/25/2022] [Indexed: 11/18/2022]
Abstract
Due to the large French vocabulary, how quickly retrieve and accurately identify the required vocabulary is still a big challenge in French learning. In view of the above problems, we introduce a deep learning algorithm in this study to upgrade and optimize the retrieval system of French words and optimize the acquisition speed of speech words data and the recognition accuracy of speech words, so as to meet the needs of users for word retrieval. The results show that the two training methods of SGD synchronous update network and alternate update network parameters for fast retrieval and extraction of French speech vocabulary reduce from a maximum of 11.65% to 4.25% in the WER criterion, with a maximum reduction of 7.4%; the two training methods of SGD synchronous update network and alternate update network parameters for fast retrieval and extraction of French speech vocabulary reduce from a maximum of 13.52% to 4.4% in the SER criterion. The training methods of fast retrieval and extraction of the SGD synchronous update network and alternate update network parameters in French speech vocabulary reduced from the highest 582 ms to 351 ms in the response time criterion, with a maximum reduction of 8.84%; the maximum reduction of 39.7%. In French speech vocabulary, SGD synchronous updating network and alternating updating network parameter algorithm are used to quickly retrieve and extract French words. When the number of iterations reaches 120, the model fitting accuracy of the training set reaches 90.05%, while the model can reach 94.5% in the test set. The system has a stronger generalization ability and a higher speech vocabulary recognition rate to meet the practical requirements.
Collapse
|
7
|
Quantum Natural Language Processing: Challenges and Opportunities. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115651] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The meeting between Natural Language Processing (NLP) and Quantum Computing has been very successful in recent years, leading to the development of several approaches of the so-called Quantum Natural Language Processing (QNLP). This is a hybrid field in which the potential of quantum mechanics is exploited and applied to critical aspects of language processing, involving different NLP tasks. Approaches developed so far span from those that demonstrate the quantum advantage only at the theoretical level to the ones implementing algorithms on quantum hardware. This paper aims to list the approaches developed so far, categorizing them by type, i.e., theoretical work and those implemented on classical or quantum hardware; by task, i.e., general purpose such as syntax-semantic representation or specific NLP tasks, like sentiment analysis or question answering; and by the resource used in the evaluation phase, i.e., whether a benchmark dataset or a custom one has been used. The advantages offered by QNLP are discussed, both in terms of performance and methodology, and some considerations about the possible usage QNLP approaches in the place of state-of-the-art deep learning-based ones are given.
Collapse
|
9
|
Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03331-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|