1
|
Ribeiro VV, Dos Santos CO, Silva GF, Santos ADN, Santos MJV. The Effect of Phonation into a Glass Tube Immersed in Water Compared to Other Interventions on General Degree of Vocal Deviation, Fundamental Frequency, Sound Pressure Level, and Vocal Self-assessment in Vocally Healthy Individuals: A Systematic Review and Meta-analysis. J Voice 2024; 38:1120-1128. [PMID: 35193789 DOI: 10.1016/j.jvoice.2022.01.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 01/21/2022] [Accepted: 01/24/2022] [Indexed: 11/17/2022]
Abstract
OBJECTIVE To analyze the effect of phonation in a glass tube immersed in water compared to other interventions on general degree of vocal deviation, fundamental frequency, sound pressure level, and vocal self-assessment in vocally healthy individuals. METHODS This is a systematic review and meta-analysis developed from the research question: "In vocally healthy individuals, what is the effect of phonation into a glass tube immersed in water versus other vocal interventions, other activities, or no intervention on general degree of vocal deviation, fundamental frequency, sound pressure level, and vocal self-assessment?" An electronic search was performed using Medline, LILACS, Cochrane Library, Embase, Web of Science, and SCOPUS databases, and a manual search was performed in the gray literature (Brazilian Digital Library of Theses and Dissertations and OpenGrey), the Journal of Voice, and the citations of the studies. Studies with (P) population of adults with healthy voices, (I) intervention with phonation into a glass tube immersed in water, (C) comparison with other vocal interventions, other activities, or no intervention, (O) outcomes of the general degree of vocal deviation, fundamental frequency, sound pressure level, and vocal self-assessment, and an (S) study with the experimental or quasi-experimental design were included. Risk of bias assessment and meta-analysis of the outcomes were performed. RESULTS A total of 457 studies were found in the search; four were selected for the systematic review and meta-analysis. In the risk of bias assessment, there was an uncertain risk of selection and performance bias in 100% of the studies and uncertain risk of detection bias of 75%. All studies had an experimental design, and most of them were conducted on women. In the fundamental frequency analysis, there was no difference between the effect sizes of the interventions (z = 0.471, P = 0.638). In the vocal self-assessment, the estimated odds ratio was 1.31, showing a greater chance of improvement in the intervention group than with the comparison group (z = 3.45, P < 0.001). There were not enough studies to analyze the general degree of vocal deviation and sound pressure level outcomes. CONCLUSION Phonation into a glass tube immersed in water has a greater positive effect on vocal self-assessment than other interventions in vocally healthy individuals.
Collapse
Affiliation(s)
- Vanessa Veis Ribeiro
- Speech-Language Pathology Department, Universidade Federal da Paraíba - UFPB, João Pessoa, Paraíba, Brazil.
| | | | - Germayne Francisco Silva
- Speech-Language Pathology Department, Universidade Federal de Sergipe - UFS, Lagarto, Sergipe, Brazil
| | | | - Maria Julia Vieira Santos
- Speech-Language Pathology Department, Universidade Federal de Sergipe - UFS, Lagarto, Sergipe, Brazil
| |
Collapse
|
2
|
Robotti C, Costantini G, Saggio G, Cesarini V, Calastri A, Maiorano E, Piloni D, Perrone T, Sabatini U, Ferretti VV, Cassaniti I, Baldanti F, Gravina A, Sakib A, Alessi E, Pietrantonio F, Pascucci M, Casali D, Zarezadeh Z, Zoppo VD, Pisani A, Benazzo M. Machine Learning-based Voice Assessment for the Detection of Positive and Recovered COVID-19 Patients. J Voice 2024; 38:796.e1-796.e13. [PMID: 34965907 PMCID: PMC8616736 DOI: 10.1016/j.jvoice.2021.11.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 12/12/2022]
Abstract
Many virological tests have been implemented during the Coronavirus Disease 2019 (COVID-19) pandemic for diagnostic purposes, but they appear unsuitable for screening purposes. Furthermore, current screening strategies are not accurate enough to effectively curb the spread of the disease. Therefore, the present study was conducted within a controlled clinical environment to determine eventual detectable variations in the voice of COVID-19 patients, recovered and healthy subjects, and also to determine whether machine learning-based voice assessment (MLVA) can accurately discriminate between them, thus potentially serving as a more effective mass-screening tool. Three different subpopulations were consecutively recruited: positive COVID-19 patients, recovered COVID-19 patients and healthy individuals as controls. Positive patients were recruited within 10 days from nasal swab positivity. Recovery from COVID-19 was established clinically, virologically and radiologically. Healthy individuals reported no COVID-19 symptoms and yielded negative results at serological testing. All study participants provided three trials for multiple vocal tasks (sustained vowel phonation, speech, cough). All recordings were initially divided into three different binary classifications with a feature selection, ranking and cross-validated RBF-SVM pipeline. This brough a mean accuracy of 90.24%, a mean sensitivity of 91.15%, a mean specificity of 89.13% and a mean AUC of 0.94 across all tasks and all comparisons, and outlined the sustained vowel as the most effective vocal task for COVID discrimination. Moreover, a three-way classification was carried out on an external test set comprised of 30 subjects, 10 per class, with a mean accuracy of 80% and an accuracy of 100% for the detection of positive subjects. Within this assessment, recovered individuals proved to be the most difficult class to identify, and all the misclassified subjects were declared positive; this might be related to mid and short-term vocal traces of COVID-19, even after the clinical resolution of the infection. In conclusion, MLVA may accurately discriminate between positive COVID-19 patients, recovered COVID-19 patients and healthy individuals. Further studies should test MLVA among larger populations and asymptomatic positive COVID-19 patients to validate this novel screening technology and test its potential application as a potentially more effective surveillance strategy for COVID-19.
Collapse
Affiliation(s)
- Carlo Robotti
- Department of Otolaryngology - Head and Neck Surgery, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy.
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy.
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy.
| | - Valerio Cesarini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Anna Calastri
- Department of Otolaryngology - Head and Neck Surgery, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Eugenia Maiorano
- Department of Otolaryngology - Head and Neck Surgery, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Davide Piloni
- Pneumology Unit, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Tiziano Perrone
- Department of Internal Medicine, Fondazione IRCCS Policlinico San Matteo, University of Pavia, Pavia, Italy
| | - Umberto Sabatini
- Department of Internal Medicine, Fondazione IRCCS Policlinico San Matteo, University of Pavia, Pavia, Italy
| | - Virginia Valeria Ferretti
- Clinical Epidemiology and Biometry Unit, Fondazione IRCCS Policlinico San Matteo Foundation, Pavia, Italy
| | - Irene Cassaniti
- Molecular Virology Unit, Microbiology and Virology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Fausto Baldanti
- Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy; Molecular Virology Unit, Microbiology and Virology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Andrea Gravina
- Otorhinolaryngology Department, University of Rome Tor Vergata, Rome, Italy
| | - Ahmed Sakib
- Otorhinolaryngology Department, University of Rome Tor Vergata, Rome, Italy
| | - Elena Alessi
- Internal Medicine Unit, Ospedale dei Castelli ASL Roma 6, Ariccia, Italy
| | | | - Matteo Pascucci
- Internal Medicine Unit, Ospedale dei Castelli ASL Roma 6, Ariccia, Italy
| | - Daniele Casali
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Zakarya Zarezadeh
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Vincenzo Del Zoppo
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Antonio Pisani
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy; IRCCS Mondino Foundation, Pavia, Italy
| | - Marco Benazzo
- Department of Otolaryngology - Head and Neck Surgery, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy; Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy
| |
Collapse
|
3
|
Yamauchi A, Imagawa H, Yokonishi H, Sakakibara KI, Tayama N. Gender- and Age- Stratified Normative Voice Data in Japanese-Speaking Subjects: Analysis of Sustained Habitual Phonations. J Voice 2024; 38:619-629. [PMID: 34980522 DOI: 10.1016/j.jvoice.2021.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Revised: 11/17/2021] [Accepted: 12/02/2021] [Indexed: 10/19/2022]
Abstract
INTRODUCTION There is no normative voice dataset for Japanese speakers in the English literature. We constructed age- and gender-stratified normative voice data with the assistance of vocally healthy Japanese speakers. METHODS A total of 111 vocally healthy Japanese speakers (42 men, 69 women) were divided into young (13 men, 30 women), middle-aged (18 men, 27 women), and elderly (11 men, 12 women) groups. Participants underwent aerodynamic, acoustic, and audio-perceptual studies of sustained habitual vowel phonations, and the obtained data were statistically analyzed in terms of age and gender. RESULTS Both gender- and age-related differences were noted in fundamental frequencies, sound pressure level, shimmer, and amplitude perturbation quotient, while only gender-related differences were noted in mean flow rate and only age-related changes were observed in subglottal pressure; laryngeal resistance; and G, R, B, and S scores of the GRBAS scale. The gender- and age-related difference data were comparable with the reported data in other languages, ethnicities, or countries. CONCLUSIONS The present study is the first to provide a database of normative voice data of Japanese speakers. The idiosyncrasy of Japanese is considered minor in sustained habitual vowel phonations.
Collapse
Affiliation(s)
- Akihito Yamauchi
- Department of Otolaryngology, The University of Tokyo Hospital, Tokyo, Japan.
| | - Hiroshi Imagawa
- Department of Otolaryngology, The University of Tokyo Hospital, Tokyo, Japan
| | - Hisayuki Yokonishi
- Department of Otolaryngology, Tokyo Metropolitan Bokutoh Hospital, Tokyo, Japan
| | - Ken-Ichi Sakakibara
- Department of Communication Disorders, Health Sciences University of Hokkaido, Hokkaido, Japan
| | - Niro Tayama
- Department of Otolaryngology and Tracheo-esophagology, National Center for Global Health and Medicine, Tokyo, Japan
| |
Collapse
|
4
|
Portalete CR, Moraes DADO, Pagliarin KC, Keske-Soares M, Cielo CA. Acoustic and Physiological Voice Assessment And Maximum Phonation Time In Patients With Different Types Of Dysarthria. J Voice 2024; 38:540.e1-540.e11. [PMID: 34895782 DOI: 10.1016/j.jvoice.2021.09.034] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 09/09/2021] [Accepted: 09/16/2021] [Indexed: 10/19/2022]
Abstract
OBJECTIVE To compare the maximum phonation time of /a/, acoustic glottal source parameters, and physiological measures in patients with dysarthria. METHOD Thirteen patients were classified according to dysarthria type and divided into functional profiles (hypofunctional, hyperfunctional, and mixed). Assessments of maximum phonation time of /a/, glottal source parameters, electroglottography, and nasometry were performed. Results were compared between groups using ANOVA and Tukey posthoc tests. RESULTS The highest fundamental frequency differed significantly between groups, with the hyperfunctional profile showing higher values than the other participant groups. Reductions in the maximum phonation time of /a/ and alterations in acoustic glottal source parameters and electroglottography measures were observed in all groups, with no significant differences between them. The remaining measures did not differ between groups. CONCLUSION The maximum phonation times for /a/ were reduced in all participant groups, suggesting air escape during phonation. The presence of alterations in several glottal source parameters in all participant groups is indicative of noise, tremor, and vocal instability. Lastly, the high fundamental frequency in patients with a hyperfunctional profile reinforces the presence of vocal instability. These findings suggest that, although the characteristics observed in the assessments were consistent with expectations of patients with dysarthria, it is difficult to perform a differential diagnosis of this condition based on acoustic and physiological parameters alone.
Collapse
|
5
|
Luo J, Wu Y, Liu M, Li Z, Wang Z, Zheng Y, Feng L, Lu J, He F. Differentiation between depression and bipolar disorder in child and adolescents by voice features. Child Adolesc Psychiatry Ment Health 2024; 18:19. [PMID: 38287442 PMCID: PMC10826007 DOI: 10.1186/s13034-024-00708-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 01/11/2024] [Indexed: 01/31/2024] Open
Abstract
OBJECTIVE Major depressive disorder (MDD) and bipolar disorder (BD) are serious chronic disabling mental and emotional disorders, with symptoms that often manifest atypically in children and adolescents, making diagnosis difficult without objective physiological indicators. Therefore, we aimed to objectively identify MDD and BD in children and adolescents by exploring their voiceprint features. METHODS This study included a total of 150 participants, with 50 MDD patients, 50 BD patients, and 50 healthy controls aged between 6 and 16 years. After collecting voiceprint data, chi-square test was used to screen and extract voiceprint features specific to emotional disorders in children and adolescents. Then, selected characteristic voiceprint features were used to establish training and testing datasets with the ratio of 7:3. The performances of various machine learning and deep learning algorithms were compared using the training dataset, and the optimal algorithm was selected to classify the testing dataset and calculate the sensitivity, specificity, accuracy, and ROC curve. RESULTS The three groups showed differences in clustering centers for various voice features such as root mean square energy, power spectral slope, low-frequency percentile energy level, high-frequency spectral slope, spectral harmonic gain, and audio signal energy level. The model of linear SVM showed the best performance in the training dataset, achieving a total accuracy of 95.6% in classifying the three groups in the testing dataset, with sensitivity of 93.3% for MDD, 100% for BD, specificity of 93.3%, AUC of 1 for BD, and AUC of 0.967 for MDD. CONCLUSION By exploring the characteristics of voice features in children and adolescents, machine learning can effectively differentiate between MDD and BD in a population, and voice features hold promise as an objective physiological indicator for the auxiliary diagnosis of mood disorder in clinical practice.
Collapse
Affiliation(s)
- Jie Luo
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Yuanzhen Wu
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Mengqi Liu
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Zhaojun Li
- Beijing Institute of Technology, School of Integrated Circuits and Electronics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China
| | - Zhuo Wang
- Beijing Institute of Technology, School of Integrated Circuits and Electronics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China
| | - Yi Zheng
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Lihui Feng
- Beijing Institute of Technology, School of Optics and Photonics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China
| | - Jihua Lu
- Beijing Institute of Technology, School of Integrated Circuits and Electronics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China.
| | - Fan He
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China.
| |
Collapse
|
6
|
Pan X, Feng T, Zhang N. PVGAN: A Pathological Voice Generation Model Incorporating a Progressive Nesting Strategy. J Voice 2023:S0892-1997(23)00315-6. [PMID: 37940422 DOI: 10.1016/j.jvoice.2023.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/04/2023] [Accepted: 10/04/2023] [Indexed: 11/10/2023]
Abstract
The voice generation task is to solve the problem of limited samples in the voice dataset using computer technology. By increasing the number of samples, the accuracy of voice disorder diagnosis can be improved, which has a wide range of application value in medical diagnosis and other fields. At present, there are insufficient models for detailed features such as pitch, timbre, and different frequency components in pathological voice data. Therefore, this paper proposes a PVGAN network for learning different frequency information of audio to generate pathological voice data. The proposed network captures the multi-scale features and different periodic patterns of audio signals by designing multiscale perceptual residual blocks and periodic discriminators. At the same time, a progressive nesting strategy was proposed to combine the generator and the discriminator to improve the learning ability of different resolution information. In addition, a latent mapping network is designed to fuse the latent vector with the condition information to generate sound features related to specific diseases or pathological states. The loss function is optimized to further improve the model performance. On the Saarbruecken Voice Database(SVD), the average values of each index of the data generated after training with different pathological types as conditional information are similar to the original data. Finally, the generated data were used to expand the SVD dataset, and the accuracy of the two classification experiments was improved to a certain extent.
Collapse
Affiliation(s)
- Xiaoying Pan
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an 710121, China; School of Computer Science & Technology, Xi'an University of Posts and Telecommunications, Xi'an 710121, China.
| | - Tong Feng
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an 710121, China; School of Computer Science & Technology, Xi'an University of Posts and Telecommunications, Xi'an 710121, China
| | - Nijuan Zhang
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an 710121, China; School of Computer Science & Technology, Xi'an University of Posts and Telecommunications, Xi'an 710121, China
| |
Collapse
|
7
|
Suppa A, Asci F, Costantini G, Bove F, Piano C, Pistoia F, Cerroni R, Brusa L, Cesarini V, Pietracupa S, Modugno N, Zampogna A, Sucapane P, Pierantozzi M, Tufo T, Pisani A, Peppe A, Stefani A, Calabresi P, Bentivoglio AR, Saggio G. Effects of deep brain stimulation of the subthalamic nucleus on patients with Parkinson's disease: a machine-learning voice analysis. Front Neurol 2023; 14:1267360. [PMID: 37928137 PMCID: PMC10622670 DOI: 10.3389/fneur.2023.1267360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/20/2023] [Indexed: 11/07/2023] Open
Abstract
Introduction Deep brain stimulation of the subthalamic nucleus (STN-DBS) can exert relevant effects on the voice of patients with Parkinson's disease (PD). In this study, we used artificial intelligence to objectively analyze the voices of PD patients with STN-DBS. Materials and methods In a cross-sectional study, we enrolled 108 controls and 101 patients with PD. The cohort of PD was divided into two groups: the first group included 50 patients with STN-DBS, and the second group included 51 patients receiving the best medical treatment. The voices were clinically evaluated using the Unified Parkinson's Disease Rating Scale part-III subitem for voice (UPDRS-III-v). We recorded and then analyzed voices using specific machine-learning algorithms. The likelihood ratio (LR) was also calculated as an objective measure for clinical-instrumental correlations. Results Clinically, voice impairment was greater in STN-DBS patients than in those who received oral treatment. Using machine learning, we objectively and accurately distinguished between the voices of STN-DBS patients and those under oral treatments. We also found significant clinical-instrumental correlations since the greater the LRs, the higher the UPDRS-III-v scores. Discussion STN-DBS deteriorates speech in patients with PD, as objectively demonstrated by machine-learning voice analysis.
Collapse
Affiliation(s)
- Antonio Suppa
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
- IRCCS Neuromed Institute, Pozzilli, IS, Italy
| | - Francesco Asci
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
- IRCCS Neuromed Institute, Pozzilli, IS, Italy
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Francesco Bove
- Neurology Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Carla Piano
- Neurology Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Francesca Pistoia
- Department of Biotechnological and Applied Clinical Sciences, University of L'Aquila, Coppito, AQ, Italy
- Neurology Unit, San Salvatore Hospital, Coppito, AQ, Italy
| | - Rocco Cerroni
- Department of System Medicine, University of Rome Tor Vergata, Rome, Italy
| | - Livia Brusa
- Neurology Unit, S. Eugenio Hospital, Rome, Italy
| | - Valerio Cesarini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Sara Pietracupa
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
- IRCCS Neuromed Institute, Pozzilli, IS, Italy
| | | | | | | | | | - Tommaso Tufo
- Neurosurgery Unit, Policlinico A. Gemelli University Hospital Foundation IRCSS, Rome, Italy
- Neurosurgery Department, Fakeeh University Hospital, Dubai, United Arab Emirates
| | - Antonio Pisani
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
- IRCCS Mondino Foundation, Pavia, Italy
| | | | - Alessandro Stefani
- Department of System Medicine, University of Rome Tor Vergata, Rome, Italy
| | - Paolo Calabresi
- Neurology Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | | | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
8
|
Suppa A, Costantini G, Gomez-Vilda P, Saggio G. Editorial: Voice analysis in healthy subjects and patients with neurologic disorders. Front Neurol 2023; 14:1288370. [PMID: 37840929 PMCID: PMC10569294 DOI: 10.3389/fneur.2023.1288370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 09/15/2023] [Indexed: 10/17/2023] Open
Affiliation(s)
- Antonio Suppa
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
- IRCCS Neuromed Institute, Pozzilli, Italy
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Pedro Gomez-Vilda
- Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
9
|
Lin Y, Cheng L, Wang Q, Xu W. Effects of Medical Masks on Voice Assessment During the COVID-19 Pandemic. J Voice 2023; 37:802.e25-802.e29. [PMID: 34116888 DOI: 10.1016/j.jvoice.2021.04.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 04/16/2021] [Accepted: 04/27/2021] [Indexed: 01/01/2023]
Abstract
OBJECTIVE Voice assessment is of great significance to the evaluation of voice quality. Our study aims to explore the effects of medical masks on healthy people in acoustic, aerodynamic and formant parameters during the COVID-19 pandemic. In addition, we also attempted to verify the differences between different sexes and ages. METHODS Fifty-three healthy participants (25 males and 28 females) were involved in our study. The acoustic parameters, including fundamental frequency (F0), sound pressure level (SPL), percentage of jitter (%), percentage of shimmer (%), noise to harmonic ratio (NHR) and cepstral peak prominence (CPP), aerodynamic parameter (maximum phonation time, MPT) and formant parameters (formant frequency, F1, F2, F3) without and with wearing medical masks were included. We further investigated the potential differences in the impact on different sexes and ages (≤45 years old and >45 years old). RESULTS While wearing medical masks, the SPL significantly increased (71.22±4.25 dB, 72.42±3.96 dB, P = 0.021). Jitter and shimmer significantly decreased (jitter 1.19±0.83, 0.87±0.67 P = 0.005; shimmer 4.49±2.20, 3.66±2.02 P = 0.002), as did F3 (2855±323.34 Hz, 2781.89±353.42 Hz P = 0.004). F0, MPT, F1 and F2 showed increasing trends without statistical significance, and NHR as well as CPP showed little change without and with wearing medical masks. There were no significant differences seen between males and females. Regarding to age, a significant difference in MPT was seen (>45-year-old 16.15±6.98 s, 15.38±7.02 s; ≤45-year-old 20.26±6.47 s, 21.44±6.98 s, P = 0.032). CONCLUSION Healthy participants showed a significantly higher SPL, a smaller perturbation and an evident decrease in F3 after wearing medical masks. These changes may result from the adjustment of the vocal tract and the filtration function of medical masks, leading to the stability of voices we recorded being overstated. The impacts of medical masks on sex were not evident, while the MPT in the >45-year-old group was influenced more than that in the ≤45-year-old group.
Collapse
Affiliation(s)
- Yuhong Lin
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Liyu Cheng
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Qingcui Wang
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Wen Xu
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China.
| |
Collapse
|
10
|
Schlegel P, Döllinger M, Reddy NK, Zhang Z, Chhetri DK. Validation and enhancement of a vocal fold medial surface 3D reconstruction approach for in-vivo application. Sci Rep 2023; 13:10705. [PMID: 37400470 DOI: 10.1038/s41598-023-36022-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 05/27/2023] [Indexed: 07/05/2023] Open
Abstract
In laryngeal research, studying the vertical vocal fold oscillation component is often disregarded. However, vocal fold oscillation by its nature is a three-dimensional process. In the past, we have developed an in-vivo experimental protocol to reconstruct the full, three-dimensional vocal fold vibration. The goal of this study is to validate this 3D reconstruction method. We present an in-vivo canine hemilarynx setup using high-speed video recording and a right-angle prism for 3D reconstruction of vocal fold medial surface vibrations. The 3D surface is reconstructed from the split image provided by the prism. For validation, reconstruction error was calculated for objects located at a distance of up to 15 mm away from the prism. The influence of camera angle, changing calibrated volume, and calibration errors were determined. Overall average 3D reconstruction error is low and does not exceed 0.12 mm at 5 mm distance from the prism. Influence of a moderate (5°) and large (10°) deviation in camera angle led to a slight increase in error to 0.16 mm and 0.17 mm, respectively. This procedure is robust towards changes in calibration volume and small calibration errors. This makes this 3D reconstruction approach a useful tool for the reconstruction of accessible and moving tissue surfaces.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head and Neck Surgery, University of California, Los Angeles, UCLA Rehabilitation Services, 1000 Veteran Ave, Los Angeles, CA, 90095, USA.
| | - Michael Döllinger
- Department of Head and Neck Surgery, Division of Phoniatrics and Pediatric Audiology, Friedrich Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Neha K Reddy
- Department of Head and Neck Surgery, University of California, Los Angeles, UCLA Rehabilitation Services, 1000 Veteran Ave, Los Angeles, CA, 90095, USA
| | - Zhaoyan Zhang
- Department of Head and Neck Surgery, University of California, Los Angeles, UCLA Rehabilitation Services, 1000 Veteran Ave, Los Angeles, CA, 90095, USA
| | - Dinesh K Chhetri
- Department of Head and Neck Surgery, University of California, Los Angeles, UCLA Rehabilitation Services, 1000 Veteran Ave, Los Angeles, CA, 90095, USA
| |
Collapse
|
11
|
Asci F, Marsili L, Suppa A, Saggio G, Michetti E, Di Leo P, Patera M, Longo L, Ruoppolo G, Del Gado F, Tomaiuoli D, Costantini G. Acoustic analysis in stuttering: a machine-learning study. Front Neurol 2023; 14:1169707. [PMID: 37456655 PMCID: PMC10347393 DOI: 10.3389/fneur.2023.1169707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 06/16/2023] [Indexed: 07/18/2023] Open
Abstract
Background Stuttering is a childhood-onset neurodevelopmental disorder affecting speech fluency. The diagnosis and clinical management of stuttering is currently based on perceptual examination and clinical scales. Standardized techniques for acoustic analysis have prompted promising results for the objective assessment of dysfluency in people with stuttering (PWS). Objective We assessed objectively and automatically voice in stuttering, through artificial intelligence (i.e., the support vector machine - SVM classifier). We also investigated the age-related changes affecting voice in stutterers, and verified the relevance of specific speech tasks for the objective and automatic assessment of stuttering. Methods Fifty-three PWS (20 children, 33 younger adults) and 71 age-/gender-matched controls (31 children, 40 younger adults) were recruited. Clinical data were assessed through clinical scales. The voluntary and sustained emission of a vowel and two sentences were recorded through smartphones. Audio samples were analyzed using a dedicated machine-learning algorithm, the SVM to compare PWS and controls, both children and younger adults. The receiver operating characteristic (ROC) curves were calculated for a description of the accuracy, for all comparisons. The likelihood ratio (LR), was calculated for each PWS during all speech tasks, for clinical-instrumental correlations, by using an artificial neural network (ANN). Results Acoustic analysis based on machine-learning algorithm objectively and automatically discriminated between the overall cohort of PWS and controls with high accuracy (88%). Also, physiologic ageing crucially influenced stuttering as demonstrated by the high accuracy (92%) of machine-learning analysis when classifying children and younger adults PWS. The diagnostic accuracies achieved by machine-learning analysis were comparable for each speech task. The significant clinical-instrumental correlations between LRs and clinical scales supported the biological plausibility of our findings. Conclusion Acoustic analysis based on artificial intelligence (SVM) represents a reliable tool for the objective and automatic recognition of stuttering and its relationship with physiologic ageing. The accuracy of the automatic classification is high and independent of the speech task. Machine-learning analysis would help clinicians in the objective diagnosis and clinical management of stuttering. The digital collection of audio samples here achieved through smartphones would promote the future application of the technique in a telemedicine context (home environment).
Collapse
Affiliation(s)
- Francesco Asci
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
- IRCCS Neuromed Institute, Pozzilli, Italy
| | - Luca Marsili
- Department of Neurology, James J. and Joan A. Gardner Center for Parkinson’s Disease and Movement Disorders, University of Cincinnati, Cincinnati, OH, United States
| | - Antonio Suppa
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
- IRCCS Neuromed Institute, Pozzilli, Italy
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | | | - Pietro Di Leo
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Martina Patera
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy
| | - Lucia Longo
- Department of Sense Organs, Otorhinolaryngology Section, Sapienza University of Rome, Rome, Italy
| | | | | | | | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
12
|
Scimeca S, Amato F, Olmo G, Asci F, Suppa A, Costantini G, Saggio G. Robust and language-independent acoustic features in Parkinson's disease. Front Neurol 2023; 14:1198058. [PMID: 37384279 PMCID: PMC10294689 DOI: 10.3389/fneur.2023.1198058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/26/2023] [Indexed: 06/30/2023] Open
Abstract
Introduction The analysis of vocal samples from patients with Parkinson's disease (PDP) can be relevant in supporting early diagnosis and disease monitoring. Intriguingly, speech analysis embeds several complexities influenced by speaker characteristics (e.g., gender and language) and recording conditions (e.g., professional microphones or smartphones, supervised, or non-supervised data collection). Moreover, the set of vocal tasks performed, such as sustained phonation, reading text, or monologue, strongly affects the speech dimension investigated, the feature extracted, and, as a consequence, the performance of the overall algorithm. Methods We employed six datasets, including a cohort of 176 Healthy Control (HC) participants and 178 PDP from different nationalities (i.e., Italian, Spanish, Czech), recorded in variable scenarios through various devices (i.e., professional microphones and smartphones), and performing several speech exercises (i.e., vowel phonation, sentence repetition). Aiming to identify the effectiveness of different vocal tasks and the trustworthiness of features independent of external co-factors such as language, gender, and data collection modality, we performed several intra- and inter-corpora statistical analyses. In addition, we compared the performance of different feature selection and classification models to evaluate the most robust and performing pipeline. Results According to our results, the combined use of sustained phonation and sentence repetition should be preferred over a single exercise. As for the set of features, the Mel Frequency Cepstral Coefficients demonstrated to be among the most effective parameters in discriminating between HC and PDP, also in the presence of heterogeneous languages and acquisition techniques. Conclusion Even though preliminary, the results of this work can be exploited to define a speech protocol that can effectively capture vocal alterations while minimizing the effort required to the patient. Moreover, the statistical analysis identified a set of features minimally dependent on gender, language, and recording modalities. This discloses the feasibility of extensive cross-corpora tests to develop robust and reliable tools for disease monitoring and staging and PDP follow-up.
Collapse
Affiliation(s)
- Sabrina Scimeca
- Department of Control and Computer Engineering, Polytechnic University of Turin, Turin, Italy
| | - Federica Amato
- Department of Control and Computer Engineering, Polytechnic University of Turin, Turin, Italy
| | - Gabriella Olmo
- Department of Control and Computer Engineering, Polytechnic University of Turin, Turin, Italy
| | - Francesco Asci
- Department of Human Neuroscience, Sapienza University of Rome, Rome, Italy
| | - Antonio Suppa
- Department of Human Neuroscience, Sapienza University of Rome, Rome, Italy
- IRCCS Neuromed Institute, Pozzilli, Italy
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
13
|
Costantini G, Cesarini V, Brenna E. High-Level CNN and Machine Learning Methods for Speaker Recognition. SENSORS (BASEL, SWITZERLAND) 2023; 23:3461. [PMID: 37050521 PMCID: PMC10098737 DOI: 10.3390/s23073461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/20/2023] [Accepted: 03/22/2023] [Indexed: 06/19/2023]
Abstract
Speaker Recognition (SR) is a common task in AI-based sound analysis, involving structurally different methodologies such as Deep Learning or "traditional" Machine Learning (ML). In this paper, we compared and explored the two methodologies on the DEMoS dataset consisting of 8869 audio files of 58 speakers in different emotional states. A custom CNN is compared to several pre-trained nets using image inputs of spectrograms and Cepstral-temporal (MFCC) graphs. AML approach based on acoustic feature extraction, selection and multi-class classification by means of a Naïve Bayes model is also considered. Results show how a custom, less deep CNN trained on grayscale spectrogram images obtain the most accurate results, 90.15% on grayscale spectrograms and 83.17% on colored MFCC. AlexNet provides comparable results, reaching 89.28% on spectrograms and 83.43% on MFCC.The Naïve Bayes classifier provides a 87.09% accuracy and a 0.985 average AUC while being faster to train and more interpretable. Feature selection shows how F0, MFCC and voicing-related features are the most characterizing for this SR task. The high amount of training samples and the emotional content of the DEMoS dataset better reflect a real case scenario for speaker recognition, and account for the generalization power of the models.
Collapse
|
14
|
Costantini G, Cesarini V, Di Leo P, Amato F, Suppa A, Asci F, Pisani A, Calculli A, Saggio G. Artificial Intelligence-Based Voice Assessment of Patients with Parkinson's Disease Off and On Treatment: Machine vs. Deep-Learning Comparison. SENSORS (BASEL, SWITZERLAND) 2023; 23:2293. [PMID: 36850893 PMCID: PMC9962335 DOI: 10.3390/s23042293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/13/2023] [Accepted: 02/16/2023] [Indexed: 06/18/2023]
Abstract
Parkinson's Disease (PD) is one of the most common non-curable neurodegenerative diseases. Diagnosis is achieved clinically on the basis of different symptoms with considerable delays from the onset of neurodegenerative processes in the central nervous system. In this study, we investigated early and full-blown PD patients based on the analysis of their voice characteristics with the aid of the most commonly employed machine learning (ML) techniques. A custom dataset was made with hi-fi quality recordings of vocal tasks gathered from Italian healthy control subjects and PD patients, divided into early diagnosed, off-medication patients on the one hand, and mid-advanced patients treated with L-Dopa on the other. Following the current state-of-the-art, several ML pipelines were compared usingdifferent feature selection and classification algorithms, and deep learning was also explored with a custom CNN architecture. Results show how feature-based ML and deep learning achieve comparable results in terms of classification, with KNN, SVM and naïve Bayes classifiers performing similarly, with a slight edge for KNN. Much more evident is the predominance of CFS as the best feature selector. The selected features act as relevant vocal biomarkers capable of differentiating healthy subjects, early untreated PD patients and mid-advanced L-Dopa treated patients.
Collapse
Affiliation(s)
- Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Valerio Cesarini
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Pietro Di Leo
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Federica Amato
- Department of Control and Computer Engineering, Polytechnic University of Turin, 10129 Turin, Italy
| | - Antonio Suppa
- Department of Human Neurosciences, Sapienza University of Rome, 00185 Rome, Italy
- IRCCS Neuromed Institute, 86077 Pozzilli, Italy
| | - Francesco Asci
- Department of Human Neurosciences, Sapienza University of Rome, 00185 Rome, Italy
- IRCCS Neuromed Institute, 86077 Pozzilli, Italy
| | - Antonio Pisani
- Department of Brain and Behavioral Sciences, University of Pavia, 27100 Pavia, Italy
- IRCCS Mondino Foundation, 27100 Pavia, Italy
| | - Alessandra Calculli
- Department of Brain and Behavioral Sciences, University of Pavia, 27100 Pavia, Italy
- IRCCS Mondino Foundation, 27100 Pavia, Italy
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| |
Collapse
|
15
|
Deep learning and machine learning-based voice analysis for the detection of COVID-19: A proposal and comparison of architectures. Knowl Based Syst 2022; 253:109539. [PMID: 35915642 PMCID: PMC9328841 DOI: 10.1016/j.knosys.2022.109539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 06/18/2022] [Accepted: 07/22/2022] [Indexed: 11/21/2022]
Abstract
Alongside the currently used nasal swab testing, the COVID-19 pandemic situation would gain noticeable advantages from low-cost tests that are available at any-time, anywhere, at a large-scale, and with real time answers. A novel approach for COVID-19 assessment is adopted here, discriminating negative subjects versus positive or recovered subjects. The scope is to identify potential discriminating features, highlight mid and short-term effects of COVID on the voice and compare two custom algorithms. A pool of 310 subjects took part in the study; recordings were collected in a low-noise, controlled setting employing three different vocal tasks. Binary classifications followed, using two different custom algorithms. The first was based on the coupling of boosting and bagging, with an AdaBoost classifier using Random Forest learners. A feature selection process was employed for the training, identifying a subset of features acting as clinically relevant biomarkers. The other approach was centered on two custom CNN architectures applied to mel-Spectrograms, with a custom knowledge-based data augmentation. Performances, evaluated on an independent test set, were comparable: Adaboost and CNN differentiated COVID-19 positive from negative with accuracies of 100% and 95% respectively, and recovered from negative individuals with accuracies of 86.1% and 75% respectively. This study highlights the possibility to identify COVID-19 positive subjects, foreseeing a tool for on-site screening, while also considering recovered subjects and the effects of COVID-19 on the voice. The two proposed novel architectures allow for the identification of biomarkers and demonstrate the ongoing relevance of traditional ML versus deep learning in speech analysis.
Collapse
|
16
|
Lin Y, Cheng L, Wang Q, Xu W. Effects of Medical Masks on Voice Quality in Patients With Voice Disorders. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1742-1750. [PMID: 35363549 DOI: 10.1044/2022_jslhr-21-00428] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE The purpose of this study was to explore the effects of medical masks on the voice quality of patients with voice disorders. METHOD We included 106 patients diagnosed with voice disorders. Among them, 59 were diagnosed with vocal-fold benign lesions, 27 with insufficient glottis closure, and 20 with precancerous lesions/early-stage glottic carcinoma. Perceptual parameters (GRBAS [grade, roughness, breathiness, asthenia, strain] scale), acoustic parameters (f o, sound pressure level [SPL], jitter, shimmer, noise-to-harmonic ratio [NHR], and cepstral peak prominence [CPP]), and maximum phonation time (MPT) without and with medical masks were analyzed. Changes in the GRBAS scale after wearing medical masks were also evaluated. RESULTS With medical mask wearing, the G, R, and B scales in the vocal-fold benign lesion and insufficient glottic closure groups decreased, with a statistical significance seen in the G and R scales of the vocal-fold benign lesion group (G 1.07 ± 0.59, 0.95 ± 0.68, p < .01; R 1.07 ± 0.59, 0.95 ± 0.68, p < .01). The B scale in the precancerous lesions/early-stage glottic carcinoma (95%) and vocal-fold benign lesion groups (83%) and R scale in the insufficient glottic closure group (77.8%) were stable with mask wearing. f o and SPL in the vocal-fold benign lesion group and f o and jitter in the insufficient glottic closure group increased significantly with medical masks. The NHR and CPP in each group changed little, and all the parameters in the precancerous lesions/early-stage glottic carcinoma group showed no significant change. CONCLUSIONS The effects of medical masks on the voice quality of patients with voice disorders were associated with the type of the disease, degree of hoarseness, and subjective scale influencing specific voice disorder. When wearing medical masks, the pitch and loudness of patients increased as compensation. Medical masks had the least impact on the precancerous lesions/early-stage glottic carcinoma group.
Collapse
Affiliation(s)
- Yuhong Lin
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, China
| | - Liyu Cheng
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, China
| | - Qingcui Wang
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, China
| | - Wen Xu
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, China
| |
Collapse
|
17
|
Yamauchi A, Imagawa H, Yokonishi H, Sakakibara KI, Tayama N. Sex and Age Stratified Voice Data in Japanese Vocally Healthy Individuals: Vocal Capacity. J Voice 2022:S0892-1997(22)00103-5. [PMID: 35513937 DOI: 10.1016/j.jvoice.2022.03.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 03/30/2022] [Accepted: 03/30/2022] [Indexed: 10/18/2022]
Abstract
OBJECTIVE There is no normative voice dataset regarding the vocal capacity of Japanese speakers in the English literature. We collected age- and sex-stratified data on the vocal capacity of vocally healthy Japanese speakers. METHODS In total, 111 vocally healthy Japanese speakers (42 men and 69 women) were divided into the young (13 men and 30 women), middle-aged (18 men and 27 women), and elderly (11 men and 12 women) groups. Participants underwent duration-, intensity-, and pitch-related vocal capacity tests using either a conventional method or an aerodynamic method or both. The data obtained were statistically analyzed in terms of age and sex. RESULTS Overall, the duration- and pitch-related parameters measured by the conventional method were generally comparable to the previous results in the literature, while duration-, pitch-, and intensity-related parameters measured by the aerodynamic method differed significantly from them. Significant sex differences were noted in all parameters in the duration-, intensity-, and pitch-related vocal capacity tests. Furthermore, significant age-related changes were observed in all parameters, except for the mean flow rate and highest pitch measured by the aerodynamic method. CONCLUSION This study is the first to provide a sex- and age-stratified database of the normative vocal capacity data of Japanese speakers. However, further improvements will be needed in the assessment protocols, conditions, or devices used for the duration-, intensity-, and pitch-related vocal capacity tests in the aerodynamic method.
Collapse
Affiliation(s)
- Akihito Yamauchi
- Department of Otolaryngology, The University of Tokyo Hospital, Tokyo, Japan.
| | - Hiroshi Imagawa
- Department of Otolaryngology, The University of Tokyo Hospital, Tokyo, Japan
| | - Hisayuki Yokonishi
- Department of Otolaryngology, Tokyo Metropolitan Bokutoh Hospital, Tokyo, Japan
| | - Ken-Ichi Sakakibara
- Department of Communication Disorders, Health Sciences University of Hokkaido, Hokkaido, Japan
| | - Niro Tayama
- Department of Otolaryngology and Tracheo-Esophagology, National Center for Global Health and Medicine, Tokyo, Japan
| |
Collapse
|
18
|
The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning. SENSORS 2022; 22:s22072461. [PMID: 35408076 PMCID: PMC9003467 DOI: 10.3390/s22072461] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 03/17/2022] [Accepted: 03/21/2022] [Indexed: 02/06/2023]
Abstract
Machine Learning (ML) algorithms within a human–computer framework are the leading force in speech emotion recognition (SER). However, few studies explore cross-corpora aspects of SER; this work aims to explore the feasibility and characteristics of a cross-linguistic, cross-gender SER. Three ML classifiers (SVM, Naïve Bayes and MLP) are applied to acoustic features, obtained through a procedure based on Kononenko’s discretization and correlation-based feature selection. The system encompasses five emotions (disgust, fear, happiness, anger and sadness), using the Emofilm database, comprised of short clips of English movies and the respective Italian and Spanish dubbed versions, for a total of 1115 annotated utterances. The results see MLP as the most effective classifier, with accuracies higher than 90% for single-language approaches, while the cross-language classifier still yields accuracies higher than 80%. The results show cross-gender tasks to be more difficult than those involving two languages, suggesting greater differences between emotions expressed by male versus female subjects than between different languages. Four feature domains, namely, RASTA, F0, MFCC and spectral energy, are algorithmically assessed as the most effective, refining existing literature and approaches based on standard sets. To our knowledge, this is one of the first studies encompassing cross-gender and cross-linguistic assessments on SER.
Collapse
|
19
|
Delgado-Ruiz R, Botticelli D, Romanos G. Temporal and Permanent Changes Induced by Maxillary Sinus Lifting with Bone Grafts and Maxillary Functional Endoscopic Sinus Surgery in the Voice Characteristics-Systematic Review. Dent J (Basel) 2022; 10:47. [PMID: 35323249 PMCID: PMC8947252 DOI: 10.3390/dj10030047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 02/26/2022] [Accepted: 03/08/2022] [Indexed: 01/01/2023] Open
Abstract
Sinus surgery procedures such as sinus lifting with bone grafting or maxillary functional endoscopy surgery (FESS) can present different complications. The aims of this systematic review are to compile the post-operatory complications of sinus elevation with bone grafting and FESS including voice changes, and to elucidate if those changes are either permanent or temporary. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) were used, and the literature was exhaustively searched without time restrictions for randomized and non-randomized clinical studies, cohort studies (prospective and retrospective), and clinical case reports with ≥4 cases focused on sinus lift procedures with bone grafts and functional endoscopic maxillary sinus surgery. A total of 435 manuscripts were identified. After reading the abstracts, 101 articles were selected to be read in full. Twenty articles that fulfilled the inclusion criteria were included for analysis. Within the limitations of this systematic review, complications are frequent after sinus lifting with bone grafts and after FEES. Voice parameters are scarcely evaluated after sinus lifting with bone grafts and no voice changes are reported. The voice changes that occur after FESS include a decreased fundamental frequency, increased nasality, and nasalance, all of which are transitory.
Collapse
Affiliation(s)
- Rafael Delgado-Ruiz
- Department of Prosthodontics and Digital Technology, Stony Brook University, Stony Brook, New York, NY 11766, USA
| | | | - Georgios Romanos
- Department of Periodontology, Stony Brook University, Stony Brook, New York, NY 11766, USA;
- Department of Oral Surgery and Implant Dentistry, Dental School (Carolinum), Johann Wolfgang Goethe University, 60596 Frankfurt, Germany
| |
Collapse
|
20
|
Suppa A, Costantini G, Asci F, Di Leo P, Al-Wardat MS, Di Lazzaro G, Scalise S, Pisani A, Saggio G. Voice in Parkinson's Disease: A Machine Learning Study. Front Neurol 2022; 13:831428. [PMID: 35242101 PMCID: PMC8886162 DOI: 10.3389/fneur.2022.831428] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 01/10/2022] [Indexed: 12/13/2022] Open
Abstract
Introduction Parkinson's disease (PD) is characterized by specific voice disorders collectively termed hypokinetic dysarthria. We here investigated voice changes by using machine learning algorithms, in a large cohort of patients with PD in different stages of the disease, OFF and ON therapy. Methods We investigated 115 patients affected by PD (mean age: 68.2 ± 9.2 years) and 108 age-matched healthy subjects (mean age: 60.2 ± 11.0 years). The PD cohort included 57 early-stage patients (Hoehn &Yahr ≤ 2) who never took L-Dopa for their disease at the time of the study, and 58 mid-advanced-stage patients (Hoehn &Yahr >2) who were chronically-treated with L-Dopa. We clinically evaluated voices using specific subitems of the Unified Parkinson's Disease Rating Scale and the Voice Handicap Index. Voice samples recorded through a high-definition audio recorder underwent machine learning analysis based on the support vector machine classifier. We also calculated the receiver operating characteristic curves to examine the diagnostic accuracy of the analysis and assessed possible clinical-instrumental correlations. Results Voice is abnormal in early-stage PD and as the disease progresses, voice increasingly degradres as demonstrated by high accuracy in the discrimination between healthy subjects and PD patients in the early-stage and mid-advanced-stage. Also, L-dopa therapy improves but not restore voice in PD as shown by high accuracy in the comparison between patients OFF and ON therapy. Finally, for the first time we achieved significant clinical-instrumental correlations by using a new score (LR value) calculated by machine learning. Conclusion Voice is abnormal in early-stage PD, progressively degrades in mid-advanced-stage and can be improved but not restored by L-Dopa. Lastly, machine learning allows tracking disease severity and quantifying the symptomatic effect of L-Dopa on voice parameters with previously unreported high accuracy, thus representing a potential new biomarker of PD.
Collapse
Affiliation(s)
- Antonio Suppa
- Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy.,IRCCS Neuromed Institute, Pozzilli, Italy
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | | | - Pietro Di Leo
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | | | - Giulia Di Lazzaro
- Neurology Unit, Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy
| | - Simona Scalise
- Department of System Medicine UOSD Parkinson, University of Rome Tor Vergata, Rome, Italy
| | - Antonio Pisani
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.,IRCCS Mondino Foundation, Pavia, Italy
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
21
|
Madruga M, Campos-Roca Y, Pérez CJ. Impact of noise on the performance of automatic systems for vocal fold lesions detection. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2021.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
22
|
Kent RD, Eichhorn JT, Vorperian HK. Acoustic parameters of voice in typically developing children ages 4-19 years. Int J Pediatr Otorhinolaryngol 2021; 142:110614. [PMID: 33450527 PMCID: PMC7902385 DOI: 10.1016/j.ijporl.2021.110614] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 12/31/2020] [Accepted: 12/31/2020] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Report data on acoustic measures of voice in sustained vowels produced by typically developing children, aged 4-19 years, to add to the cross-sectional reference values in a pediatric database. METHODS Recordings of sustained vowel/ɑ/phonation were obtained from 158 children (80 males, 78 females) aged 4-19 years who were judged to be typically developing with respect to speech and voice. Acoustic analyses were performed with the Multidimensional Voice Program (MDVP™) and the Analysis of Dysphonia in Speech and Voice (ADSV™), both from Pentax Medical. RESULTS Values from both MDVP and ADSV are reported for children in the following age cohorts: 4-6 years, 7-9 years, 10-12 years, 13-15 years, and 16-19 years. CONCLUSION The data in this study complement previously published data and contribute to a pediatric reference database useful for research and for clinical practice related to children's voice. Acoustic parameters most sensitive to age and sex are identified.
Collapse
Affiliation(s)
- Raymond D. Kent
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Ave., Madison, WI 53705
| | - Julie T. Eichhorn
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Ave., Madison, WI 53705
| | - Houri K. Vorperian
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Ave., Madison, WI 53705
| |
Collapse
|
23
|
Monti E, D’Andrea W, Freed S, Kidd DC, Feuer S, Carroll LM, Castano E. Does Self-Reported Childhood Trauma Relate to Vocal Acoustic Measures? Preliminary Findings at Trauma Recall. JOURNAL OF NONVERBAL BEHAVIOR 2021. [DOI: 10.1007/s10919-020-00355-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
24
|
Pathology-Related Influences on the VEM: Three Years' Experience since Implementation of a New Parameter in Phoniatric Voice Diagnostics. BIOMED RESEARCH INTERNATIONAL 2020; 2020:5309508. [PMID: 33506007 PMCID: PMC7814951 DOI: 10.1155/2020/5309508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 11/17/2020] [Accepted: 12/10/2020] [Indexed: 02/08/2023]
Abstract
The vocal extent measure (VEM) represents a new diagnostic tool to express vocal capacity by quantifying the dynamic performance and frequency range of voice range profiles (VRPs). For VEM calculation, the VRP area is multiplied by the quotient of the theoretical perimeter of a circle with equal VRP area and the actual VRP perimeter. Since different diseases affect voice function to varying degrees, pathology-related influences on the VEM should be investigated more detailed in this retrospective study, three years after VEM implementation. Data was obtained in a standardized voice assessment comprising videolaryngostroboscopy, voice handicap index (VHI-9i), and acoustic-aerodynamic analysis with automatic calculation of VEM and dysphonia severity index (DSI). The complete dataset comprised 1030 subjects, from which 994 adults (376 male, 618 female; 18-86 years) were analyzed more detailed. The VEM differed significantly between pathology subgroups (p < 0.001) and correlated with the corresponding DSI values. Regarding VHI-9i, the VEM reflected the subjective impairment better than the DSI. We conclude that the VEM proved to be a comprehensible and easy-to-use interval-scaled parameter for objective VRP evaluation in all pathology subgroups. As expected, exclusive consideration of the measured pathology-related influences on the VEM does not allow conclusions regarding the specific underlying diagnosis.
Collapse
|