1
|
Davatz GC, Yamasaki R, Hachiya A, Tsuji DH, Montagnoli AN. Source and Filter Acoustic Measures of Young, Middle-Aged and Elderly Adults for Application in Vowel Synthesis. J Voice 2024; 38:253-263. [PMID: 34756498 DOI: 10.1016/j.jvoice.2021.08.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 08/28/2021] [Accepted: 08/31/2021] [Indexed: 10/20/2022]
Abstract
INTRODUCTION The output sound has important changes throughout life due to anatomical and physiological modifications in the larynx and vocal tract. Understanding the young adult to the elderly speech acoustic characteristics may assist in the synthesis of representative voices of men and women of different age groups. OBJECTIVE To obtain the fundamental frequency (f0), formant frequencies (F1, F2, F3, F4), and bandwidth (B1, B2, B3, B4) values extracted from the sustained vowel /a/ of young, middle-aged, and elderly adults who are Brazilian Portuguese speakers; to present the application of these parameters in vowel synthesis. STUDY DESIGN Prospective study. METHODS The acoustic analysis of tokens of the 162 sustained vowel /a/ produced by vocally healthy adults, men, and women, between 18 and 80 years old, was performed. The adults were divided into three groups: young adults (18 to 44 years old); middle-aged adults (45 to 59 years old) and, elderly adults (60 to 80 years old). The f0, F1, F2, F3, F4, B1, B2, B3, B4 were extracted from the audio signals. Their average values were applied to a source-filter mathematical model to perform vowel synthesis in each age group both men and woman. RESULTS Young women had higher f0 than middle-aged and elderly women. Elderly women had lower F1 than middle-aged women. Young women had higher F2 than elderly women. For the men's output sound, the source-filter acoustic measures were statistically equivalent among the age groups. Average values of the f0, F1, F2, F3, F4, B1, and B2 were higher in women. The sound waves distance in signals, the position of formant frequencies and the dimension of the bandwidths visible in spectra of the synthesized sounds represent the average values extracted from the volunteers' emissions for the sustained vowel /a/ in Brazilian Portuguese. CONCLUSION Sustained vowel /a/ produced by women presented different values of f0,F1 and F2 between age groups, which was not observed for men. In addition to the f0 and the formant frequencies, the bandwidths were also different between women and men. The synthetic vowels available represent the acoustic changes found for each sex as a function of age.
Collapse
Affiliation(s)
- Giovanna Castilho Davatz
- Interunit Graduate Program in Bioengineering, Programa de Pós-Graduação Interunidades em Bioengenharia da EESC/IQSC/FMRP - USP - University of São Paulo - Av. Trabalhador São-carlense, 400, São Carlos/SP, Brazil, Zip Code: 13566-590
| | - Rosiane Yamasaki
- Federal University of São Paulo, Universidade Federal de São Paulo - UNIFESP - Department of Speech-Language Pathology - R. Botucatu, 802 - Vila Clementino - São Paulo/SP, Brazil, Zip Code: 04023-062.
| | - Adriana Hachiya
- Department of Otolaryngology of Clinical Hospital of University of São Paulo - Faculdade de Medicina da Universidade de São Paulo (FMUSP) - Rua, Av. Dr. Enéas Carvalho de Aguiar, 255, São Paulo/SP, Brazil, Zip Code: 05403-000
| | - Domingos Hiroshi Tsuji
- Department of Otolaryngology of Clinical Hospital of University of São Paulo - Faculdade de Medicina da Universidade de São Paulo (FMUSP) - Rua, Av. Dr. Enéas Carvalho de Aguiar, 255, São Paulo/SP, Brazil, Zip Code: 05403-000
| | - Arlindo Neto Montagnoli
- Federal University of São Carlos, Universidade Federal de São Carlos - UFSCar- Department of Electrical Engineering - Rodovia Washington Luís, km 235 - São Carlos/SP, Brazil, Zip Code: 13565-905
| |
Collapse
|
2
|
Iyer A, Kemp A, Rahmatallah Y, Pillai L, Glover A, Prior F, Larson-Prior L, Virmani T. A machine learning method to process voice samples for identification of Parkinson's disease. Sci Rep 2023; 13:20615. [PMID: 37996478 PMCID: PMC10667335 DOI: 10.1038/s41598-023-47568-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/15/2023] [Indexed: 11/25/2023] Open
Abstract
Machine learning approaches have been used for the automatic detection of Parkinson's disease with voice recordings being the most used data type due to the simple and non-invasive nature of acquiring such data. Although voice recordings captured via telephone or mobile devices allow much easier and wider access for data collection, current conflicting performance results limit their clinical applicability. This study has two novel contributions. First, we show the reliability of personal telephone-collected voice recordings of the sustained vowel /a/ in natural settings by collecting samples from 50 people with specialist-diagnosed Parkinson's disease and 50 healthy controls and applying machine learning classification with voice features related to phonation. Second, we utilize a novel application of a pre-trained convolutional neural network (Inception V3) with transfer learning to analyze the spectrograms of the sustained vowel from these samples. This approach considers speech intensity estimates across time and frequency scales rather than collapsing measurements across time. We show the superiority of our deep learning model for the task of classifying people with Parkinson's disease as distinct from healthy controls.
Collapse
Affiliation(s)
- Anu Iyer
- Georgia Institute of Technology, Atlanta, 30332, USA
| | - Aaron Kemp
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA.
| | - Yasir Rahmatallah
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Lakshmi Pillai
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Aliyah Glover
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Fred Prior
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Linda Larson-Prior
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
- Neurobiology and Developmental Sciences, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| | - Tuhin Virmani
- Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
- Neurology, University of Arkansas for Medical Sciences, Little Rock, 72205, USA
| |
Collapse
|
3
|
Schultz BG, Rojas S, St John M, Kefalianos E, Vogel AP. A Cross-sectional Study of Perceptual and Acoustic Voice Characteristics in Healthy Aging. J Voice 2023; 37:969.e23-969.e41. [PMID: 34272139 DOI: 10.1016/j.jvoice.2021.06.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/02/2021] [Accepted: 06/10/2021] [Indexed: 11/22/2022]
Abstract
PURPOSE The human voice qualitatively changes across the lifespan. Although some of these vocal changes may be pathologic, other changes likely reflect natural physiological aging. Normative data for voice characteristics in healthy aging is limited and disparate studies have used a range of different acoustic features, some of which are implicated in pathologic voice changes. We examined the perceptual and acoustic features that predict healthy aging. METHOD Participants (N = 150) aged between 50 and 92 years performed a sustained vowel task. Acoustic features were measured using the Multi-Dimensional Voice Program and the Analysis of Dysphonia in Speech and Voice. We used forward and backward variable elimination techniques based on the Bayesian information criterion and linear regression to assess which of these acoustic features predict age and perceptual features. Hearing thresholds were determined using pure-tone audiometry tests at frequencies 250 Hz, 500 Hz, 1000 Hz, 2000 Hz, and 4000 Hz. We further explored potential relationships between these acoustic features and clinical assessments of voice quality using the Consensus Auditory-Perceptual Evaluation of Voice. RESULTS Chronological age was significantly predicted by greater voice turbulence, variability of cepstral fundamental frequency, low relative to high spectral energy, and cepstral intensity. When controlling for hearing loss, age was significantly predicted by amplitude perturbations and cepstral intensity. Clinical assessments of voice indicated perceptual characteristics of speech were predicted by different acoustic features. For example, breathiness was predicted by the soft phonation index, mean cepstral peak prominence, mean low-high spectral ratio, and mean cepstral intensity. CONCLUSIONS Findings suggest that acoustic features that predict healthy aging are different than those previously reported for the pathologic voice. We propose a model of healthy and pathologic voice development in which voice characteristics are mediated by the inability to monitor vocal productions associated with age-related hearing loss. This normative data of healthy vocal aging may assist in separating voice pathologies from healthy aging.
Collapse
Affiliation(s)
- Benjamin G Schultz
- Centre for Neuroscience of Speech, The University of Melbourne, Melbourne, Australia; Department of Audiology and Speech Pathology, The University of Melbourne, Melbourne, Australia
| | - Sandra Rojas
- Centre for Neuroscience of Speech, The University of Melbourne, Melbourne, Australia; Department of Audiology and Speech Pathology, The University of Melbourne, Melbourne, Australia
| | - Miya St John
- Speech and Language, Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - Elaina Kefalianos
- Department of Audiology and Speech Pathology, The University of Melbourne, Melbourne, Australia
| | - Adam P Vogel
- Centre for Neuroscience of Speech, The University of Melbourne, Melbourne, Australia; Department of Audiology and Speech Pathology, The University of Melbourne, Melbourne, Australia; Redenlab, Australia.
| |
Collapse
|
4
|
Kim JA, Jang H, Choi Y, Min YG, Hong YH, Sung JJ, Choi SJ. Subclinical articulatory changes of vowel parameters in Korean amyotrophic lateral sclerosis patients with perceptually normal voices. PLoS One 2023; 18:e0292460. [PMID: 37831677 PMCID: PMC10575489 DOI: 10.1371/journal.pone.0292460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/21/2023] [Indexed: 10/15/2023] Open
Abstract
The available quantitative methods for evaluating bulbar dysfunction in patients with amyotrophic lateral sclerosis (ALS) are limited. We aimed to characterize vowel properties in Korean ALS patients, investigate associations between vowel parameters and clinical features of ALS, and analyze subclinical articulatory changes of vowel parameters in those with perceptually normal voices. Forty-three patients with ALS (27 with dysarthria and 16 without dysarthria) and 20 healthy controls were prospectively collected in the study. Dysarthria was assessed using the ALS Functional Rating Scale-Revised (ALSFRS-R) speech subscores, with any loss of 4 points indicating the presence of dysarthria. The structured speech samples were recorded and analyzed using Praat software. For three corner vowels (/a/, /i/, and /u/), data on the vowel duration, fundamental frequency, frequencies of the first two formants (F1 and F2), harmonics-to-noise ratio, vowel space area (VSA), and vowel articulation index (VAI) were extracted from the speech samples. Corner vowel durations were significantly longer in ALS patients with dysarthria than in healthy controls. The F1 frequency of /a/, F2 frequencies of /i/ and /u/, the VSA, and the VAI showed significant differences between ALS patients with dysarthria and healthy controls. The area under the curve (AUC) was 0.912. The F1 frequency of /a/ and the VSA were the major determinants for differentiating ALS patients who had not yet developed apparent dysarthria from healthy controls (AUC 0.887). In linear regression analyses, as the ALSFRS-R speech subscore decreased, both the VSA and VAI were reduced. In contrast, vowel durations were found to be rather prolonged. The analyses of vowel parameters provided a useful metric correlated with disease severity for detecting subclinical bulbar dysfunction in ALS patients.
Collapse
Affiliation(s)
- Jin-Ah Kim
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Translational Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
- Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, Republic of Korea
| | - Hayeun Jang
- Division of English, Busan University of Foreign Studies, Busan, Republic of Korea
| | - Yoonji Choi
- Department of Korean Language and Literature, Seoul National University, Seoul, Republic of Korea
| | - Young Gi Min
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Translational Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Yoon-Ho Hong
- Department of Neurology, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Seoul, Republic of Korea
| | - Jung-Joon Sung
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Neuroscience Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Seok-Jin Choi
- Department of Neurology, Seoul National University Hospital, Seoul, Republic of Korea
- Center for Hospital Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| |
Collapse
|
5
|
Vorperian HK, Kent RD, Lee Y, Buhr KA. Vowel Production in Children and Adults With Down Syndrome: Fundamental and Formant Frequencies of the Corner Vowels. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1208-1239. [PMID: 37015000 PMCID: PMC10187968 DOI: 10.1044/2022_jslhr-22-00510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 12/01/2022] [Accepted: 12/21/2022] [Indexed: 05/18/2023]
Abstract
PURPOSE Atypical vowel production contributes to reduced speech intelligibility in children and adults with Down syndrome (DS). This study compares the acoustic data of the corner vowels /i/, /u/, /æ/, and /ɑ/ from speakers with DS against typically developing/developed (TD) speakers. METHOD Measurements of the fundamental frequency (f o) and first four formant frequencies (F1-F4) were obtained from single word recordings containing the target vowels from 81 participants with DS (ages 3-54 years) and 293 TD speakers (ages 4-92 years), all native speakers of English. The data were used to construct developmental trajectories and to determine interspeaker and intraspeaker variability. RESULTS Trajectories for DS differed from TD based on age and sex, but the groups were similar with the striking change in f o and F1-F4 frequencies around age 10 years. Findings confirm higher f o in DS, and vowel-specific differences between DS and TD in F1 and F2 frequencies, but not F3 and F4. The measure of F2 differences of front-versus-back vowels was more sensitive of compression than reduced vowel space area/centralization across age and sex. Low vowels had more pronounced F2 compression as related to reduced speech intelligibility. Intraspeaker variability was significantly greater for DS than TD for nearly all frequency values across age. DISCUSSION Vowel production differences between DS and TD are age- and sex-specific, which helps explain contradictory results in previous studies. Increased intraspeaker variability across age in DS confirms the presence of a persisting motor speech disorder. Atypical vowel production in DS is common and related to dysmorphology, delayed development, and disordered motor control.
Collapse
Affiliation(s)
- Houri K. Vorperian
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin–Madison
| | - Raymond D. Kent
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin–Madison
| | - Yen Lee
- Department of Educational Leadership, Edgewood College, Madison, Wisconsin
| | - Kevin A. Buhr
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison
| |
Collapse
|
6
|
Kent SAK, Fletcher TL, Morgan A, Morton M, Hall RJ, Sandage MJ. Updated Acoustic Normative Data through the Lifespan: A Scoping Review. J Voice 2023:S0892-1997(23)00066-8. [PMID: 36941164 DOI: 10.1016/j.jvoice.2023.02.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 03/23/2023]
Abstract
OBJECTIVE To assess the recent literature for voice acoustic data values reported for individuals without voice disorder through the lifespan as a means to develop an updated normative acoustic data resource for children and adults. METHODS A scoping review was conducted using the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist. English language, full-text publications were identified through Medline (EBSCO & OVID), PubMed, APA PsycINFO, Web of Science, Google Scholar, and ProQuest Theses and Dissertations Global. RESULTS A total of 903 sources were retrieved; of these 510 were duplicates. Abstracts of 393 were screened, with 68 full-text review. From the eligible studies, citation review yielded 51 additional resources. Twenty-eight sources were included for data extraction. For the normative acoustic data extracted for males and females across the lifespan, lower fundamental frequency for adult females was observed and few studies collected semitone range, sound level range, or frequency range. Data extraction also indicated a predominately gender binary reporting of acoustic measures with few studies reporting gender identity, race, or ethnicity as variables of interest. CONCLUSIONS The scoping review yielded updated acoustic normative data that is of value for clinicians and researchers who rely on this normative data to make determinations about vocal function. The limited availability of acoustic data by gender, race, and ethnicity creates barriers for generalization of these normative values across all patients, clients, and research volunteers.
Collapse
Affiliation(s)
- Samantha A K Kent
- Department of Speech, Language & Hearing Sciences, Auburn University, Auburn, Alabama
| | - T Laine Fletcher
- Department of Speech, Language & Hearing Sciences, Auburn University, Auburn, Alabama
| | - Abigail Morgan
- Department of Speech, Language & Hearing Sciences, Auburn University, Auburn, Alabama
| | - Mariah Morton
- School of Kinesiology, Auburn University, Auburn, Alabama
| | - Rebecca J Hall
- Department of Speech, Language & Hearing Sciences, Auburn University, Auburn, Alabama
| | - Mary J Sandage
- Department of Speech, Language & Hearing Sciences, Auburn University, Auburn, Alabama.
| |
Collapse
|
7
|
Albuquerque L, Oliveira C, Teixeira A, Sa-Couto P, Figueiredo D. A Comprehensive Analysis of Age and Gender Effects in European Portuguese Oral Vowels. J Voice 2023; 37:143.e13-143.e29. [PMID: 33293174 DOI: 10.1016/j.jvoice.2020.10.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 10/30/2020] [Accepted: 10/30/2020] [Indexed: 01/11/2023]
Abstract
The knowledge about the age effects in speech acoustics is still disperse and incomplete. This study extends the analyses of the effects of age and gender on acoustics of European Portuguese (EP) oral vowels, in order to complement initial studies with limited sets of acoustic parameters, and to further investigate unclear or inconsistent results. A database of EP vowels produced by a group of 113 adults, aged between 35 and 97, was used. Duration, fundamental frequency (f0), formant frequencies (F1 to F3), and a selection of vowel space metrics (F1 and F2 range ratios, vowel articulation index [VAI] and formant centralization ratio [FCR]) were analyzed. To avoid the arguable division into age groups, the analyses considered age as a continuous variable. The most relevant age-related results included: vowel duration increase in both genders; a general tendency to formant frequencies decrease for females; changes that were consistent with vowel centralization for males, confirmed by the vowel space acoustic indexes; and no evidence of F3 decrease with age, in both genders. This study has contributed to knowledge on aging speech, providing new information for an additional language. The results corroborated that acoustic characteristics of speech change with age and present different patterns between genders.
Collapse
Affiliation(s)
- Luciana Albuquerque
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal; Center for Health Technology and Services Research, University of Aveiro, Aveiro, Portugal; Department of Electronics Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal; Department of Education and Psychology, University of Aveiro, Aveiro, Portugal.
| | - Catarina Oliveira
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal; School of Health Science, University of Aveiro, Aveiro, Portugal
| | - António Teixeira
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal; Department of Electronics Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Pedro Sa-Couto
- Center for Research and Development in Mathematics and Applications, University of Aveiro, Aveiro, Portugal; Department of Mathematics, University of Aveiro, Aveiro, Portugal
| | - Daniela Figueiredo
- Center for Health Technology and Services Research, University of Aveiro, Aveiro, Portugal; School of Health Science, University of Aveiro, Aveiro, Portugal
| |
Collapse
|
8
|
Pernon M, Assal F, Kodrasi I, Laganaro M. Perceptual Classification of Motor Speech Disorders: The Role of Severity, Speech Task, and Listener's Expertise. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2727-2747. [PMID: 35878401 DOI: 10.1044/2022_jslhr-21-00519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE The clinical diagnosis of motor speech disorders (MSDs) is mainly based on perceptual approaches. However, studies on perceptual classification of MSDs often indicate low classification accuracy. The aim of this study was to determine in a forced-choice dichotomous decision-making task (a) how accuracy of speech-language pathologists (SLPs) in perceptually classifying apraxia of speech (AoS) and dysarthria is impacted by speech task, severity of MSD, and listener's expertise and (b) which perceptual features they use to classify. METHOD Speech samples from 29 neurotypical speakers, 14 with hypokinetic dysarthria associated with Parkinson's disease (HD), 10 with poststroke AoS, and six with mixed dysarthria associated with amyotrophic lateral sclerosis (MD-FlSp [combining flaccid and spastic dysarthria]), were classified by 20 expert SLPs and 20 student SLPs. Speech samples were elicited in spontaneous speech, text reading, oral diadochokinetic (DDK) tasks, and a sample concatenating text reading and DDK. For each recorded speech sample, SLPs answered three dichotomic questions following a diagnostic approach, (a) neurotypical versus pathological speaker, (b) AoS versus dysarthria, and (c) MD-FlSp versus HD, and a multiple-choice question on the features their decision was based on. RESULTS Overall classification accuracy was 72% with good interrater reliability, varying with SLP expertise, speech task, and MSD severity. Correct classification of speech samples was higher for speakers with dysarthria than for AoS and higher for HD than for MD-FlSp. Samples elicited with continuous speech reached the best classification rates. An average number of three perceptual features were used for correct classifications, and their type and combination differed between the three MSDs. CONCLUSIONS The auditory-perceptual classification of MSDs in a diagnostic approach reaches substantial performance only in expert SLPs with continuous speech samples, albeit with lower accuracy for AoS. Specific training associated with objective classification tools seems necessary to improve recognition of neurotypical speech and distinction between AoS and dysarthria.
Collapse
Affiliation(s)
- Michaela Pernon
- Neurology Department, Geneva University Hospitals, Switzerland
- Faculty of Medicine, University of Geneva, Switzerland
- Laboratoire de Phonétique et Phonologie, UMR 7018, CNRS-Université Sorbonne Nouvelle, Paris, France
- CRMR Wilson & Parkinson Unit, Neurology Department, Hôpital Fondation Adolphe de Rothschild, Paris, France
| | - Frédéric Assal
- Neurology Department, Geneva University Hospitals, Switzerland
- Faculty of Medicine, University of Geneva, Switzerland
| | - Ina Kodrasi
- Signal Processing for Communication Group, Idiap Research Institute, Martigny, Switzerland
| | - Marina Laganaro
- Faculty of Psychology and Educational Sciences, University of Geneva, Switzerland
| |
Collapse
|
9
|
Fukuda M, Nishimura R, Nishizaki H, Horii K, Iribe Y, Yamamoto K, Kitaoka N. A new speech corpus of super-elderly Japanese for acoustic modeling. COMPUT SPEECH LANG 2022. [DOI: 10.1016/j.csl.2022.101424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
10
|
Exploring the Age Effects on European Portuguese Vowel Production: An Ultrasound Study. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12031396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
For aging speech, there is limited knowledge regarding the articulatory adjustments underlying the acoustic findings observed in previous studies. In order to investigate the age-related articulatory differences in European Portuguese (EP) vowels, the present study analyzes the tongue configuration of the nine EP oral vowels (isolated context and pseudoword context) produced by 10 female speakers of two different age groups (young and old). From the tongue contours automatically segmented from the US images and manually revised, the parameters (tongue height and tongue advancement) were extracted. The results suggest that the tongue tends to be higher and more advanced for the older females compared to the younger ones for almost all vowels. Thus, the vowel articulatory space tends to be higher, advanced, and bigger with age. For older females, unlike younger females that presented a sharp reduction in the articulatory vowel space in disyllabic sequences, the vowel space tends to be more advanced for isolated vowels compared with vowels produced in disyllabic sequences. This study extends our pilot research by reporting articulatory data from more speakers based on an improved automatic method of tongue contours tracing, and it performs an inter-speaker comparison through the application of a novel normalization procedure.
Collapse
|
11
|
A Longitudinal Study of Speech Acoustics in Older French Females: Analysis of the Filler Particle euh across Utterance Positions. LANGUAGES 2021. [DOI: 10.3390/languages6040211] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Aging in speech production is a multidimensional process. Biological, cognitive, social, and communicative factors can change over time, stay relatively stable, or may even compensate for each other. In this longitudinal work, we focus on stability and change at the laryngeal and supralaryngeal levels in the discourse particle euh produced by 10 older French-speaking females at two times, 10 years apart. Recognizing the multiple discourse roles of euh, we divided out occurrences according to utterance position. We quantified the frequency of euh, and evaluated acoustic changes in formants, fundamental frequency, and voice quality across time and utterance position. Results showed that euh frequency was stable with age. The only acoustic measure that revealed an age effect was harmonics-to-noise ratio, showing less noise at older ages. Other measures mostly varied with utterance position, sometimes in interaction with age. Some voice quality changes could reflect laryngeal adjustments that provide for airflow conservation utterance-finally. The data suggest that aging effects may be evident in some prosodic positions (e.g., utterance-final position), but not others (utterance-initial position). Thus, it is essential to consider the interactions among these factors in future work and not assume that vocal aging is evident throughout the signal.
Collapse
|
12
|
Nussbaum C, von Eiff CI, Skuk VG, Schweinberger SR. Vocal emotion adaptation aftereffects within and across speaker genders: Roles of timbre and fundamental frequency. Cognition 2021; 219:104967. [PMID: 34875400 DOI: 10.1016/j.cognition.2021.104967] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 10/22/2021] [Accepted: 11/23/2021] [Indexed: 12/12/2022]
Abstract
While the human perceptual system constantly adapts to the environment, some of the underlying mechanisms are still poorly understood. For instance, although previous research demonstrated perceptual aftereffects in emotional voice adaptation, the contribution of different vocal cues to these effects is unclear. In two experiments, we used parameter-specific morphing of adaptor voices to investigate the relative roles of fundamental frequency (F0) and timbre in vocal emotion adaptation, using angry and fearful utterances. Participants adapted to voices containing emotion-specific information in either F0 or timbre, with all other parameters kept constant at an intermediate 50% morph level. Full emotional voices and ambiguous voices were used as reference conditions. All adaptor stimuli were either of the same (Experiment 1) or opposite speaker gender (Experiment 2) of subsequently presented target voices. In Experiment 1, we found consistent aftereffects in all adaptation conditions. Crucially, aftereffects following timbre adaptation were much larger than following F0 adaptation and were only marginally smaller than those following full adaptation. In Experiment 2, adaptation aftereffects appeared massively and proportionally reduced, with differences between morph types being no longer significant. These results suggest that timbre plays a larger role than F0 in vocal emotion adaptation, and that vocal emotion adaptation is compromised by eliminating gender-correspondence between adaptor and target stimuli. Our findings also add to mounting evidence suggesting a major role of timbre in auditory adaptation.
Collapse
Affiliation(s)
- Christine Nussbaum
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany.
| | - Celina I von Eiff
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany
| | - Verena G Skuk
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany
| | - Stefan R Schweinberger
- Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, Germany.
| |
Collapse
|
13
|
Abstract
Digital health data are multimodal and high-dimensional. A patient's health state can be characterized by a multitude of signals including medical imaging, clinical variables, genome sequencing, conversations between clinicians and patients, and continuous signals from wearables, among others. This high volume, personalized data stream aggregated over patients' lives has spurred interest in developing new artificial intelligence (AI) models for higher-precision diagnosis, prognosis, and tracking. While the promise of these algorithms is undeniable, their dissemination and adoption have been slow, owing partially to unpredictable AI model performance once deployed in the real world. We posit that one of the rate-limiting factors in developing algorithms that generalize to real-world scenarios is the very attribute that makes the data exciting-their high-dimensional nature. This paper considers how the large number of features in vast digital health data can challenge the development of robust AI models-a phenomenon known as "the curse of dimensionality" in statistical learning theory. We provide an overview of the curse of dimensionality in the context of digital health, demonstrate how it can negatively impact out-of-sample performance, and highlight important considerations for researchers and algorithm designers.
Collapse
|
14
|
Abstract
We present a multidimensional acoustic report describing variation in speech productions on data collected from 500 francophone adult speakers (20 to 93 y.o.a.) as a function of age. In this cross-sectional study, chronological age is considered as a continuous variable while oral productions, in reading and speech-like tasks, are characterized via 22 descriptors related to voice quality, pitch, vowel articulation and vocalic system organization, time-related measures and temporal organization, as well as maximal performances in speech-like tasks. In a first analysis, we detail how each descriptor varies according to the age of the speaker, for male and female speakers separately. In a second analysis, we explore how chronological age is, in turn, predicted by the combination of all descriptors. Overall, results confirm that with increasing age, speakers show more voice instability, sex-dependent pitch changes, slower speech and articulation rates, slower repetition rates and less complexity effects in maximal performance tasks. A notable finding of this study is that some of these changes are continuous throughout adulthood while other appear either at old age or in early adulthood. Chronological age appears only moderately indexed in speech, mainly through speech rate parameters. We discuss these results in relation with the notion of attrition and with other possible factors at play, in an attempt to better capture the multidimensional nature of the notion of “age”.
Collapse
|
15
|
On the Primary Influences of Age on Articulation and Phonation in Maximum Performance Tasks. LANGUAGES 2021. [DOI: 10.3390/languages6040174] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Maximum performance tasks have been identified as possible domains where incipient signs of neurological disease may be detected in simple speech and voice samples. However, it is likely that these will simultaneously be influenced by the age and sex of the speaker. In this study, a comprehensive set of acoustic quantifications were collected from the literature and applied to productions of sustained [a] productions and Alternating Motion Rate diadochokinetic (DDK) syllable sequences made by 130 (62 women, 68 men) healthy speakers, aged 20–90 years. The participants were asked to produce as stable (sustained [a] and DDK) and fast (DDK) productions as possible. The full set of features were reduced to a functional subset that most efficiently modeled sex-specific differences between younger and older speakers using a cross-validation procedure. Twelve measures of [a] and 16 measures of DDK sequences were identified across men and women and investigated in terms of how they were altered with increasing age of speakers. Increased production instability is observed in both tasks, primarily above the age of 60 years. DDK sequences were slower in older speakers, but also altered in their syllable and segment level acoustic properties. Increasing age does not appear to affect phonation or articulation uniformly, and men and women are affected differently in most quantifications investigated.
Collapse
|
16
|
Lavan N. The effect of familiarity on within-person age judgements from voices. Br J Psychol 2021; 113:287-299. [PMID: 34415575 DOI: 10.1111/bjop.12526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 06/22/2021] [Indexed: 11/28/2022]
Abstract
Listeners can perceive a person's age from their voice with above chance accuracy. Studies have usually established this by asking listeners to directly estimate the age of unfamiliar voices. The recordings used mostly include cross-sectional samples of voices, including people of different ages to cover the age range of interest. Such cross-sectional samples likely include not only cues to age in the sound of the voice but also socio-phonetic cues, encoded in how a person speaks. How age perpcetion accuracy is affected when minimizing socio-phonetic cues by sampling the same voice at different time points remains largely unknown. Similarly, with the voices in age perception studies being usually unfamiliar to listeners, it is unclear how familiarity with a voice affects age perception. We asked listeners who were either familiar or unfamiliar with a set of four voices to complete an age discrimination task: listeners heard two recordings of the same person's voice, recorded 15 years apart, and were asked to indicate in which recording the person was younger. Accuracy for both familiar and unfamiliar listeners was above chance. While familiarity advantages were apparent, accuracy was not particularly high: familiar and unfamiliar listeners were correct for 68.2% and 62.7% of trials, respectively (chance = 50%). Familiarity furthermore interacted with the voices included. Overall, our findings indicate that age perception from voices is not a trivial task at all times - even when listeners are familiar with a voice. We discuss our findings in the light of how reliable voice may be as a signal for age.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Experimental and Biological Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK.,Department of Speech, Hearing and Phonetic Sciences, University College London, UK
| |
Collapse
|
17
|
Albuquerque L, Valente ARS, Teixeira A, Figueiredo D, Sa-Couto P, Oliveira C. Association between acoustic speech features and non-severe levels of anxiety and depression symptoms across lifespan. PLoS One 2021; 16:e0248842. [PMID: 33831018 PMCID: PMC8031302 DOI: 10.1371/journal.pone.0248842] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 03/07/2021] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Several studies have investigated the acoustic effects of diagnosed anxiety and depression. Anxiety and depression are not characteristics of the typical aging process, but minimal or mild symptoms can appear and evolve with age. However, the knowledge about the association between speech and anxiety or depression is scarce for minimal/mild symptoms, typical of healthy aging. As longevity and aging are still a new phenomenon worldwide, posing also several clinical challenges, it is important to improve our understanding of non-severe mood symptoms' impact on acoustic features across lifetime. The purpose of this study was to determine if variations in acoustic measures of voice are associated with non-severe anxiety or depression symptoms in adult population across lifetime. METHODS Two different speech tasks (reading vowels in disyllabic words and describing a picture) were produced by 112 individuals aged 35-97. To assess anxiety and depression symptoms, the Hospital Anxiety Depression Scale (HADS) was used. The association between the segmental and suprasegmental acoustic parameters and HADS scores were analyzed using the linear multiple regression technique. RESULTS The number of participants with presence of anxiety or depression symptoms is low (>7: 26.8% and 10.7%, respectively) and non-severe (HADS-A: 5.4 ± 2.9 and HADS-D: 4.2 ± 2.7, respectively). Adults with higher anxiety symptoms did not present significant relationships associated with the acoustic parameters studied. Adults with increased depressive symptoms presented higher vowel duration, longer total pause duration and short total speech duration. Finally, age presented a positive and significant effect only for depressive symptoms, showing that older participants tend to have more depressive symptoms. CONCLUSIONS Non-severe depression symptoms can be related to some acoustic parameters and age. Depression symptoms can be explained by acoustic parameters even among individuals without severe symptom levels.
Collapse
Affiliation(s)
- Luciana Albuquerque
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal
- Center of Health Technology and Services Research, University of Aveiro, Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
- Department of Education and Psychology, University of Aveiro, Aveiro, Portugal
- * E-mail:
| | - Ana Rita S. Valente
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - António Teixeira
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal
- Department of Electronics Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
| | - Daniela Figueiredo
- Center of Health Technology and Services Research, University of Aveiro, Aveiro, Portugal
- School of Health Science, University of Aveiro, Aveiro, Portugal
| | - Pedro Sa-Couto
- Center for Research and Development in Mathematics and Applications, University of Aveiro, Aveiro, Portugal
- Department of Mathematics, University of Aveiro, Aveiro, Portugal
| | - Catarina Oliveira
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal
- School of Health Science, University of Aveiro, Aveiro, Portugal
| |
Collapse
|
18
|
Tucker BV, Ford C, Hedges S. Speech aging: Production and perception. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2021; 12:e1557. [PMID: 33651922 DOI: 10.1002/wcs.1557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 12/18/2020] [Accepted: 02/05/2021] [Indexed: 11/06/2022]
Abstract
In this overview we describe literature on how speech production and speech perception change in healthy or normal aging across the adult lifespan. In the production section we review acoustic characteristics that have been investigated as potentially distinguishing younger and older adults. In the speech perception section studies concerning speaker age estimation and those investigating older listeners' perception are addressed. Our discussion focuses on major themes and other fruitful areas for future research. This article is categorized under: Linguistics > Language in Mind and Brain Linguistics > Linguistic Theory Psychology > Development and Aging.
Collapse
Affiliation(s)
- Benjamin V Tucker
- Department of Linguistics, University of Alberta, Edmonton, Alberta, Canada
| | - Catherine Ford
- Department of Linguistics, University of Alberta, Edmonton, Alberta, Canada
| | - Stephanie Hedges
- Department of Linguistics, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
19
|
Albert G, Arnocky S, Puts DA, Hodges-Simeon CR. Can listeners assess men's self-reported health from their voice? EVOL HUM BEHAV 2021. [DOI: 10.1016/j.evolhumbehav.2020.08.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
20
|
Speech Segregation in Active Middle Ear Stimulation: Masking Release With Changing Fundamental Frequency. Ear Hear 2020; 42:709-717. [PMID: 33369941 DOI: 10.1097/aud.0000000000000973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Temporal fine structure information such as low-frequency sounds including the fundamental frequency (F0) is important to separate different talkers in noisy environments. Speech perception in noise is negatively affected by reduced temporal fine structure resolution in cochlear hearing loss. It has been shown that normal-hearing (NH) people as well as cochlear implant patients with preserved acoustic low-frequency hearing benefit from different F0 between concurrent talkers. Though patients with an active middle ear implant (AMEI) report better sound quality compared with hearing aids, they often struggle when listening in noise. The primary objective was to evaluate whether or not patients with a Vibrant Soundbridge AMEI were able to benefit from F0 differences in a concurrent talker situation and if the effect was comparable to NH individuals. DESIGN A total of 13 AMEI listeners and 13 NH individuals were included. A modified variant of the Oldenburg sentence test was used to emulate a concurrent talker scenario. One sentence from the test corpus served as the masker and the remaining sentences as target speech. The F0 of the masker sentence was shifted upward by 4, 8, and 12 semitones. The target and masker sentences were presented simultaneously to the study subjects and the speech reception threshold was assessed by adaptively varying the masker level. To evaluate any impact of the occlusion effect on speech perception, AMEI listeners were tested in two configurations: with a plugged ear-canal contralateral to the implant side, indicated as AMEIcontra, or with both ears plugged, indicated as AMEIboth. RESULTS In both study groups, speech perception improved when the F0 difference between target and masker increased. This was significant when the difference was at least 8 semitones; the F0-based release from masking was 3.0 dB in AMEIcontra (p = 0.009) and 2.9 dB in AMEIboth (p = 0.015), compared with 5.6 dB in NH listeners (p < 0.001). A difference of 12 semitones revealed a F0-based release from masking of 3.5 dB in the AMEIcontra (p = 0.002) and 3.4 dB in the AMEIboth (p = 0.003) condition, compared with 5.0 dB in NH individuals (p < 0.001). CONCLUSIONS Though AMEI users deal with problems resulting from cochlear damage, hearing amplification with the implant enables a masking release based on F0 differences when F0 between a target and masker sentence was at least 8 semitones. Additional occlusion of the ear canal on the implant side did not affect speech performance. The current results complement the knowledge about the benefit of F0 within the acoustic low-frequency hearing.
Collapse
|
21
|
Berger T, Meuret S, Engel C, Vogel M, Kiess W, Fuchs M, Poulain T. [Detection of relevant changes in the speaking voice of women measured by the speaking voice profile]. Laryngorhinootologie 2020; 101:127-137. [PMID: 33327005 DOI: 10.1055/a-1327-4275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
BACKGROUND A healthy voice serves us as a basis for communication and an indispensable tool in a modern society with a growing number of vocal-intensive professions. There are indications that the average frequency of the speaking voice of women has decreased in recent years and is approaching that of men in the sense of sociophony. An epidemiological prospective cohort study will investigate the influences of age, personality traits and socio-demographic factors on the speaking voice of women. MATERIAL AND METHODS Within the framework of a standardized examination procedure, the speaking voice of 2478 voice-healthy female participants between 5 and 83 years of age was registered in 4 different intensity levels (softest voice, conversational voice, classroom voice and shouting voice). Subsequently, the collected values for frequency and loudness of the different intensity levels were examined for correlation with age and results from questionnaires on personality (FFFK and BFI-10), on (mental) health (Patient-Health-Questionnaire - PHQ) and on socio-economic status (SES). RESULTS Significant age-related influences on the speaking voice could be demonstrated for all voice intensities. For the personality traits investigated, significant positive correlations between the volume of the calling and speaking voice and extraversion were found. For the frequency of the softest voice and speaking voice, significant correlations were found for the personality traits of extraversion and tolerance. While no significant associations were found between the voice parameters of the speaking voice and the PHQ, it was found that the SES has a significant influence on both frequency and intensity. CONCLUSION In addition to age-related changes, relevant influences of personality traits and the SES on speaking voice parameters in women were confirmed, which should be considered in clinical care of dysphonia.
Collapse
Affiliation(s)
- Thomas Berger
- Klinik für Hals-Nasen-Ohrenheilkunde/Plastische Operationen, Universitätsklinikum Leipzig - AöR
| | - Sylvia Meuret
- Sektion Phoniatrie und Audiologie, Klinik für Hals-Nasen-Ohrenheilkunde/Plastische Operationen, Universitätsklinikum Leipzig - AöR
| | - Christoph Engel
- Institut für Medizinische Informatik, Statistik und Epidemiologie (IMISE), Universität Leipzig
| | - Mandy Vogel
- Medizinische Fakultät, LIFE Forschungszentrum, Universität Leipzig
| | - Wieland Kiess
- Medizinische Fakultät, LIFE Forschungszentrum, Universität Leipzig.,Klinik und Poliklinik für Kinder- und Jugendmedizin, Universitätsklinikum Leipzig - AöR
| | - Michael Fuchs
- Sektion Phoniatrie und Audiologie, Klinik für Hals-Nasen-Ohrenheilkunde/Plastische Operationen, Universitätsklinikum Leipzig - AöR
| | - Tanja Poulain
- Medizinische Fakultät, LIFE Forschungszentrum, Universität Leipzig
| |
Collapse
|
22
|
Moreno–Torres I, Nava E. Consonant and vowel articulation accuracy in younger and middle-aged Spanish healthy adults. PLoS One 2020; 15:e0242018. [PMID: 33166341 PMCID: PMC7652263 DOI: 10.1371/journal.pone.0242018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 10/23/2020] [Indexed: 11/20/2022] Open
Abstract
Children acquire vowels earlier than consonants, and the former are less vulnerable to speech disorders than the latter. This study explores the hypothesis that a similar contrast exists later in life and that consonants are more vulnerable to ageing than vowels. Data was obtained with two experiments comparing the speech of Younger Adults (YAs) and Middle–aged Adults (MAs). In the first experiment an Automatic Speech Recognition (ASR) system was trained with a balanced corpus of 29 YAs and 27 MAs. The productions of each speaker were obtained in a Spanish language word (W) and non–word (NW) repetition task. The performance of the system was evaluated with the same corpus used for training using a cross validation approach. The ASR system recognized to a similar extent the Ws of both groups of speakers, but it was more successful with the NWs of the YAs than with those of the MAs. Detailed error analysis revealed that the MA speakers scored below the YA speakers for consonants and also for the place and manner of articulation features; the results were almost identical in both groups of speakers for vowels and for the voicing feature. In the second experiment a group of healthy native listeners was asked to recognize isolated syllables presented with background noise. The target speakers were one YA and one MA that had taken part in the first experiment. The results were consistent with those of the ASR experiment: the manner and place of articulation were better recognized, and vowels and voicing were worse recognized, in the YA speaker than in the MA speaker. We conclude that consonant articulation is more vulnerable to ageing than vowel articulation. Future studies should explore whether or not these early and selective changes in articulation accuracy might be caused by changes in speech perception skills (e.g., in auditory temporal processing).
Collapse
Affiliation(s)
| | - Enrique Nava
- Department of Communications Engineering, University of Málaga, Málaga, Spain
| |
Collapse
|
23
|
Leung Y, Oates J, Papp V, Chan SP. Formant Frequencies of Adult Speakers of Australian English and Effects of Sex, Age, Geographical Location, and Vowel Quality. J Voice 2020; 36:875.e1-875.e13. [PMID: 33268219 DOI: 10.1016/j.jvoice.2020.09.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 09/03/2020] [Accepted: 09/29/2020] [Indexed: 10/23/2022]
Abstract
AIMS The primary aim of this study was to provide normative formant frequency (F) values for male and female speakers of Australian English. The secondary aim was to examine the effects of speaker sex, age, vowel quality, and geographical location on F. METHOD The first three monophthong formant frequencies (F1, F2, and F3) for 244 female and 135 male speakers aged 18-60 years from a recent large-scale corpus of Australian English were analysed on a passage reading task. RESULTS Mixed effects linear regression models suggested that speaker sex, speaker age, and vowel quality significantly predicted F1, F2, and F3 (P = 0.000). Effect sizes suggested that speaker sex and vowel quality contributed most to the variations in F1, F2, and F3 whereas speaker age and geographical location contributed a smaller amount. CONCLUSION Both clinicians and researchers are provided with normative F data for 18-60 year-old speakers of Australian English. Such data have increased internal and external validity relative to previous literature. F normative data for speakers of Australian English should be considered with reference to speaker sex and vowel but it may not be practically necessary to adjust for speaker age and geographical location.
Collapse
Affiliation(s)
- Yeptain Leung
- Department of Speech Pathology, Orthoptics and Audiology, School of Allied Health, Human Services and Sport, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, Australia.
| | - Jennifer Oates
- Department of Speech Pathology, Orthoptics and Audiology, School of Allied Health, Human Services and Sport, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, Australia
| | - Viktória Papp
- School of Language, Social and Political Sciences, New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Christchurch, New Zealand
| | - Siew-Pang Chan
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Cardiovascular Research Institute, National University Heart Centre Singapore, National University Health System, Singapore; Department of Mathematics and Statistics, La Trobe University, Melbourne, Victoria, Australia
| |
Collapse
|
24
|
Shin W, Lee H, Shin J, Holliday JJ. The Potential Role of Talker Age in the Perception of Regional Accent. LANGUAGE AND SPEECH 2020; 63:479-505. [PMID: 31288603 DOI: 10.1177/0023830919861666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The speech signal contains potential cues to a wide range of socioindexical variables. The aim of the current study was to investigate how one variable, talker age, might interact with the perception of a theoretically independent variable, regional accent. We investigated this question specifically in the case of Korean: although many studies have reported on phonetic differences among Korean dialects and speakers' beliefs and attitudes about them, there has been virtually no research on the auditory perception of such variation. Potential acoustic cues to regional accent were measured in read sentence productions from a total of 72 male talkers in their 20s or 50s to 60s from six Korean provinces. Then, in a perception experiment, native listeners from Seoul (n = 21), Gyeongsang (n = 10), and Jeolla (n = 10) listened to the sentences and were asked to identify the talker's regional origin from among 6 provinces. Listeners' responses correlated with talker age: young talkers were disproportionately perceived as being from Seoul, and old talkers-even life-long Seoul residents-were disproportionately perceived as being from non-Seoul regions. A follow-up experiment with listeners from Seoul (n = 30) in which talker age was treated as a between-subjects factor showed an attenuated effect, suggesting that the effect of talker age on perceived regional origin may be partially driven by a contrast effect, such that the speech of older talkers is perceived as less standard-and thus coming from a non-Seoul region-when being directly compared with that of younger talkers.
Collapse
Affiliation(s)
- Woobong Shin
- Department of Korean Language and Literature, Jeju National University, Republic of Korea
| | - Hyangwon Lee
- Department of Korean Language and Literature, Korea University, Republic of Korea
| | - Jiyoung Shin
- Department of Korean Language and Literature, Korea University, Republic of Korea
| | - Jeffrey J Holliday
- Department of Korean Language and Literature, Korea University, Republic of Korea
| |
Collapse
|
25
|
Siqueira LTD, Silverio KCA, Berretin-FÉlix G, Genaro KF, Fukushiro AP, Brasolotto AG. Influence of vocal and aerodynamics aspects on the voice-related quality of life of older adults. J Appl Oral Sci 2020; 28:e20200052. [PMID: 32813841 PMCID: PMC7433863 DOI: 10.1590/1678-7757-2020-0052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 06/29/2020] [Indexed: 11/21/2022] Open
Abstract
The pursuit for quality of life urged a better understanding of aspects involved in ageing to minimize its consequences. Although many studies investigated older adults’ voice, aspects affecting this population voice-related quality of life have not yet been explored.
Collapse
Affiliation(s)
| | - Kelly Cristina Alves Silverio
- Departament of Speech-Language Pathology, Faculdade de Odontologia de Bauru, Universidade de São Paulo, Bauru, São Paulo, Brasil
| | - Giédre Berretin-FÉlix
- Departament of Speech-Language Pathology, Faculdade de Odontologia de Bauru, Universidade de São Paulo, Bauru, São Paulo, Brasil
| | - Kátia Flores Genaro
- Departament of Speech-Language Pathology, Faculdade de Odontologia de Bauru, Universidade de São Paulo, Bauru, São Paulo, Brasil
| | - Ana Paula Fukushiro
- Departament of Speech-Language Pathology, Faculdade de Odontologia de Bauru, Universidade de São Paulo, Bauru, São Paulo, Brasil
| | - Alcione Ghedini Brasolotto
- Departament of Speech-Language Pathology, Faculdade de Odontologia de Bauru, Universidade de São Paulo, Bauru, São Paulo, Brasil
| |
Collapse
|
26
|
Kosztyła-Hojna B, Duchnowska E, Zdrojkowski M, Łobaczuk-Sitnik A, Biszewska J. Application of High Speed Digital Imaging (HSDI) technique and voice acoustic analysis in the diagnosis of the clinical form of Presbyphonia in women. Otolaryngol Pol 2020; 74:24-30. [PMID: 34550094 DOI: 10.5604/01.3001.0014.1580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
<b>Introduction:</b> The aging process of voice begins after the age of 60 and has an individually variable course. Voice quality disorders at this age are called senile voice (Presbyphonia or Vox Senium). Voice pathology is particularly severe in women. The aim of the study was to diagnose the clinical form of Presbyphonia in elderly women using High Speed Digital Imaging (HSDI) and acoustic voice analysis. <br><b>Material and methods:</b> Study included 50 elderly women (average age 69) with dysphonia (Group I). Control group (Group II) included 30 women (average age 71) without voice quality disorders. Visualization assessment has been conducted with High Speed Digital Imaging (HSDI) with High Speed camera (HS). Acoustic evaluation of voice included analysis isolated vowel "a" and continuous linguistic text with Diagnoscope Specialista software. Maximum Phonation Time (MPT) has been determined. <br><b>Results:</b> In Group I, 78% of women revealed vocal folds vibrations asymmetry, vibration amplitude increase, Mucousal Wave (MW) limitation and Type D glottal insufficiency (GTs). Acoustic voice analysis proved decrease in F0, increase in Jitter, Shimmer, NHR. In 22% of women, next to vibrations asymmetry, vibration amplitude reduction and MW limitation, Type E glottal insufficiency (GTs) have been found. Acoustic voice analysis revealed slight decrease in F0 and the presence of numerous non-harmonic components in the glottis region. <br><b>Conclusions:</b> Vocal folds visualization with HSDI showed edema, less often atrophy in elderly women. Both forms of dysphonia were caused abnormal values of F0, Jitter, Shimmer, NHR in the acoustic voice evaluation and significant reduction of MPT.
Collapse
Affiliation(s)
- Bożena Kosztyła-Hojna
- Department of Clinical Phonoaudiology and Speech Therapy, Medical University of Bialystok, Poland
| | - Emilia Duchnowska
- Department of Clinical Phonoaudiology and Speech Therapy, Medical University of Bialystok, Poland
| | - Maciej Zdrojkowski
- Department of Clinical Phonoaudiology and Speech Therapy, Medical University of Bialystok, Poland
| | - Anna Łobaczuk-Sitnik
- Department of Clinical Phonoaudiology and Speech Therapy, Medical University of Bialystok, Poland
| | - Jolanta Biszewska
- Department of Clinical Phonoaudiology and Speech Therapy, Medical University of Bialystok, Poland
| |
Collapse
|
27
|
Plexico LW, Sandage MJ, Kluess HA, Franco-Watkins AM, Neidert LE. Blood Plasma Hormone-Level Influence on Vocal Function. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:1376-1386. [PMID: 32402220 PMCID: PMC7842117 DOI: 10.1044/2020_jslhr-19-00224] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 11/26/2019] [Accepted: 01/31/2020] [Indexed: 05/27/2023]
Abstract
Purpose This preliminary study examined the influence of menstrual cycle phase and hormone levels on acoustic measurements of vocal function in reproductive and postmenopausal females. Mean fundamental frequency (f0), speaking fundamental frequency (Sf0), and cepstral peak prominence (CPP) were evaluated. It was hypothesized that Sf0 and CPP would be lower during the luteal and ischemic phases of the menstrual cycle. Group differences with lower values in postmenopausal females and greater variability in the reproductive females were also hypothesized. Method A mixed factorial analysis of variance was used to examine differences between reproductive and postmenopausal females and the four phases of the menstrual cycle. Separate analyses of variances were implemented for each of the dependent measures. Twenty-eight female participants (15 reproductive cycling, 13 postmenopausal) completed the study. Participants were recorded reading the Rainbow Passage and sustaining the vowel /a/. Mean vocal f0, Sf0, and CPP were determined from the acoustic samples. Blood assays were used to determine estrogen, progesterone, testosterone, and neuropeptide Y levels at four data collection time points. Results Group differences in hormone levels and Sf0 values were established with the postmenopausal group having significantly lower hormone levels and significantly lower Sf0 than the reproductive cycling group across the phases. Analysis of the reproductive group by hormone levels and cycle phase revealed no significant differences for CPP or Sf0 across phases. Higher estrogen was identified in the ovulation phase, and higher progesterone was identified in the luteal phase. Conclusions Significant differences in hormone levels and Sf0 were identified between groups. Within the reproductive cycling group, the lack of significant difference in acoustic measures relative to hormone levels indicated that the measures taken may not have been sensitive enough to identify hormonally mediated vocal function changes. The participant selection may have biased the findings in that health conditions and medications that are known to influence voice function were used as exclusion criteria.
Collapse
|
28
|
Comparison of Habitual and High Pitch Phonation in Teachers With and Without Vocal Fatigue. J Voice 2020; 36:141.e1-141.e9. [DOI: 10.1016/j.jvoice.2020.04.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 03/28/2020] [Accepted: 04/01/2020] [Indexed: 11/23/2022]
|
29
|
Taylor S, Dromey C, Nissen SL, Tanner K, Eggett D, Corbin-Lewis K. Age-Related Changes in Speech and Voice: Spectral and Cepstral Measures. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:647-660. [PMID: 32097060 PMCID: PMC7229708 DOI: 10.1044/2019_jslhr-19-00028] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 10/15/2019] [Accepted: 11/21/2019] [Indexed: 06/02/2023]
Abstract
Purpose This study examined differences in selected acoustic measures of speech and voice according to age and sex and across families. Method Participants included 169 individuals, 79 men and 90 women, from 18 families, ranging in age from 17 to 87 years. Participants reported no history of articulation disorders, stroke or active neurologic disease, or severe-to-profound hearing loss. They read aloud two passages to facilitate examination of the following speech and voice acoustic parameters: fricative spectral moments (center of gravity, standard deviation, skewness, and kurtosis), the proportion of time spent speaking, mean speaking fundamental frequency, semitone standard deviation (STSD), and cepstral peak prominence smoothed. Results The results indicated a significant age effect for fricative spectral center of gravity, spectral skewness, and speaking STSD. There was a significant sex effect for spectral center of gravity, spectral kurtosis, and mean fundamental frequency. Familial relationship was significant for spectral skewness, STSD, and cepstral peak prominence smoothed. Conclusions These findings revealed that certain speech and voice features change with age and some change differently for men and women. Additionally, speakers from the same family units may demonstrate similar patterns for prosody, voicing, and articulatory behavior. The results also demonstrated normal differences in speech and voice variation across age, sex, and family unit. Understanding patterns and differences across these demographic variables in healthy speakers is important to distinguishing more confidently between normal and disordered speech and voice patterns clinically.
Collapse
Affiliation(s)
- Sammi Taylor
- Department of Communication Disorders, Brigham Young University, Provo, UT
| | - Christopher Dromey
- Department of Communication Disorders, Brigham Young University, Provo, UT
| | - Shawn L. Nissen
- Department of Communication Disorders, Brigham Young University, Provo, UT
| | - Kristine Tanner
- Department of Communication Disorders, Brigham Young University, Provo, UT
| | - Dennis Eggett
- Department of Statistics, Brigham Young University, Provo, UT
| | - Kim Corbin-Lewis
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
| |
Collapse
|
30
|
Effect of Ageing on Acoustic Characteristics of Voice Pitch and Formants in Czech Vowels. J Voice 2020; 35:931.e21-931.e33. [DOI: 10.1016/j.jvoice.2020.02.022] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 02/25/2020] [Accepted: 02/26/2020] [Indexed: 11/20/2022]
|
31
|
Vorperian HK, Kent RD, Lee Y, Bolt DM. Corner vowels in males and females ages 4 to 20 years: Fundamental and F1-F4 formant frequencies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3255. [PMID: 31795713 PMCID: PMC6850954 DOI: 10.1121/1.5131271] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/06/2019] [Accepted: 10/08/2019] [Indexed: 05/29/2023]
Abstract
The purpose of this study was to determine the developmental trajectory of the four corner vowels' fundamental frequency (fo) and the first four formant frequencies (F1-F4), and to assess when speaker-sex differences emerge. Five words per vowel, two of which were produced twice, were analyzed for fo and estimates of the first four formants frequencies from 190 (97 female, 93 male) typically developing speakers ages 4-20 years old. Findings revealed developmental trajectories with decreasing values of fo and formant frequencies. Sex differences in fo emerged at age 7. The decrease of fo was larger in males than females with a marked drop during puberty. Sex differences in formant frequencies appeared at the earliest age under study and varied with vowel and formant. Generally, the higher formants (F3-F4) were sensitive to sex differences. Inter- and intra-speaker variability declined with age but had somewhat different patterns, likely reflective of maturing motor control that interacts with the changing anatomy. This study reports a source of developmental normative data on fo and the first four formants in both sexes. The different developmental patterns in the first four formants and vowel-formant interactions in sex differences likely point to anatomic factors, although speech-learning phenomena cannot be discounted.
Collapse
Affiliation(s)
- Houri K Vorperian
- Vocal Tract Development Laboratory, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, Wisconsin 53705, USA
| | - Raymond D Kent
- Vocal Tract Development Laboratory, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, Wisconsin 53705, USA
| | - Yen Lee
- Department of Educational Psychology, University of Wisconsin-Madison, 1086 Educational, Sciences Building, 1025 West Johnson Street, Madison, Wisconsin 53706, USA
| | - Daniel M Bolt
- Department of Educational Psychology, University of Wisconsin-Madison, 1086 Educational, Sciences Building, 1025 West Johnson Street, Madison, Wisconsin 53706, USA
| |
Collapse
|
32
|
Kent RD, Vorperian HK. Static measurements of vowel formant frequencies and bandwidths: A review. JOURNAL OF COMMUNICATION DISORDERS 2018; 74:74-97. [PMID: 29891085 PMCID: PMC6002811 DOI: 10.1016/j.jcomdis.2018.05.004] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 04/23/2018] [Accepted: 05/27/2018] [Indexed: 05/05/2023]
Abstract
PURPOSE Data on vowel formants have been derived primarily from static measures representing an assumed steady state. This review summarizes data on formant frequencies and bandwidths for American English and also addresses (a) sources of variability (focusing on speech sample and time sampling point), and (b) methods of data reduction such as vowel area and dispersion. METHOD Searches were conducted with CINAHL, Google Scholar, MEDLINE/PubMed, SCOPUS, and other online sources including legacy articles and references. The primary search items were vowels, vowel space area, vowel dispersion, formants, formant frequency, and formant bandwidth. RESULTS Data on formant frequencies and bandwidths are available for both sexes over the lifespan, but considerable variability in results across studies affects even features of the basic vowel quadrilateral. Origins of variability likely include differences in speech sample and time sampling point. The data reveal the emergence of sex differences by 4 years of age, maturational reductions in formant bandwidth, and decreased formant frequencies with advancing age in some persons. It appears that a combination of methods of data reduction provide for optimal data interpretation. CONCLUSION The lifespan database on vowel formants shows considerable variability within specific age-sex groups, pointing to the need for standardized procedures.
Collapse
Affiliation(s)
- Raymond D Kent
- Waisman Center, University of Wisconsin-Madison, United States.
| | | |
Collapse
|