1
|
Kreiman J, Lee Y. Biological, linguistic, and individual factors govern voice qualitya). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:482-492. [PMID: 39846773 DOI: 10.1121/10.0034848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 12/17/2024] [Indexed: 01/24/2025]
Abstract
Voice quality serves as a rich source of information about speakers, providing listeners with impressions of identity, emotional state, age, sex, reproductive fitness, and other biologically and socially salient characteristics. Understanding how this information is transmitted, accessed, and exploited requires knowledge of the psychoacoustic dimensions along which voices vary, an area that remains largely unexplored. Recent studies of English speakers have shown that two factors related to speaker size and arousal consistently emerge as the most important determinants of quality, regardless of who is speaking. The present findings extend this picture by demonstrating that in four languages that vary fundamental frequency (fo) and/or phonation type contrastively (Korean, Thai, Gujarati, and White Hmong), additional acoustic variability is systematically related to the phonology of the language spoken, and the amount of variability along each dimension is consistent across speaker groups. This study concludes that acoustic voice spaces are structured in a remarkably consistent way: first by biologically driven, evolutionarily grounded factors, second by learned linguistic factors, and finally by variations within a talker over utterances, possibly due to personal style, emotional state, social setting, or other dynamic factors. Implications for models of speaker recognition are also discussed.
Collapse
Affiliation(s)
- Jody Kreiman
- Departments of Head and Neck Surgery and Linguistics, UCLA, Los Angeles, California 90095-1794, USA
| | - Yoonjeong Lee
- USC Viterbi School of Engineering, University of Southern California, Los Angeles, California 90089-1455, USA
| |
Collapse
|
2
|
Smith HMJ, Ritchie KL, Baguley TS, Lavan N. Face and voice identity matching accuracy is not improved by multimodal identity information. Br J Psychol 2024. [PMID: 39690725 DOI: 10.1111/bjop.12757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 11/28/2024] [Indexed: 12/19/2024]
Abstract
Identity verification from both faces and voices can be error-prone. Previous research has shown that faces and voices signal concordant information and cross-modal unfamiliar face-to-voice matching is possible, albeit often with low accuracy. In the current study, we ask whether performance on a face or voice identity matching task can be improved by using multimodal stimuli which add a second modality (voice or face). We find that overall accuracy is higher for face matching than for voice matching. However, contrary to predictions, presenting one unimodal and one multimodal stimulus within a matching task did not improve face or voice matching compared to presenting two unimodal stimuli. Additionally, we find that presenting two multimodal stimuli does not improve accuracy compared to presenting two unimodal face stimuli. Thus, multimodal information does not improve accuracy. However, intriguingly, we find that cross-modal face-voice matching accuracy predicts voice matching accuracy but not face matching accuracy. This suggests cross-modal information can nonetheless play a role in identity matching, and face and voice information combine to inform matching decisions. We discuss our findings in light of current models of person perception, and consider the implications for identity verification in security and forensic settings.
Collapse
Affiliation(s)
| | - Kay L Ritchie
- School of Psychology, University of Lincoln, Lincoln, UK
| | - Thom S Baguley
- NTU Psychology, Nottingham Trent University, Nottingham, UK
| | - Nadine Lavan
- Department of Biological and Experimental Psychology, Queen Mary University London, London, UK
| |
Collapse
|
3
|
Cervantes Constantino F, Caputi Á. Cortical tracking of speakers' spectral changes predicts selective listening. Cereb Cortex 2024; 34:bhae472. [PMID: 39656649 DOI: 10.1093/cercor/bhae472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 10/20/2024] [Accepted: 11/15/2024] [Indexed: 12/17/2024] Open
Abstract
A social scene is particularly informative when people are distinguishable. To understand somebody amid a "cocktail party" chatter, we automatically index their voice. This ability is underpinned by parallel processing of vocal spectral contours from speech sounds, but it has not yet been established how this occurs in the brain's cortex. We investigate single-trial neural tracking of slow frequency modulations in speech using electroencephalography. Participants briefly listened to unfamiliar single speakers, and in addition, they performed a cocktail party comprehension task. Quantified through stimulus reconstruction methods, robust tracking was found in neural responses to slow (delta-theta range) modulations of frequency contours in the fourth and fifth formant band, equivalent to the 3.5-5 KHz audible range. The spectral spacing between neighboring instantaneous frequency contours (ΔF), which also yields indexical information from the vocal tract, was similarly decodable. Moreover, EEG evidence of listeners' spectral tracking abilities predicted their chances of succeeding at selective listening when faced with two-speaker speech mixtures. In summary, the results indicate that the communicating brain can rely on locking of cortical rhythms to major changes led by upper resonances of the vocal tract. Their corresponding articulatory mechanics hence continuously issue a fundamental credential for listeners to target in real time.
Collapse
Affiliation(s)
- Francisco Cervantes Constantino
- Instituto de Investigaciones Biológicas Clemente Estable, Department of Integrative and Computational Neurosciences, Av. Italia 3318, Montevideo, 11.600, Uruguay
- Facultad de Psicología, Universidad de la República
| | - Ángel Caputi
- Instituto de Investigaciones Biológicas Clemente Estable, Department of Integrative and Computational Neurosciences, Av. Italia 3318, Montevideo, 11.600, Uruguay
| |
Collapse
|
4
|
Rechenberg L, Meurer EM, Melos M, Nienov OH, Corleta HVE, Capp E. Voice, Speech, and Clinical Aspects During Pregnancy: A Longitudinal Study. J Voice 2024; 38:1431-1438. [PMID: 35662512 DOI: 10.1016/j.jvoice.2022.04.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Revised: 04/26/2022] [Accepted: 04/27/2022] [Indexed: 10/18/2022]
Abstract
BACKGROUND Pregnancy involves anatomical, physiological, and metabolic changes in a woman's body. However, the effects of these changes on the voice remains unclear, particularly regarding the clinical characteristics. OBJECTIVES We aimed to evaluate changes in vocal and speech acoustic measures and the relationship between them and clinical aspects in women during pregnancy. METHOD A prospective, longitudinal study was carried out with 41 low risk, adult, pregnant women, followed for prenatal care. Demographic and anthropometric data as well as lifestyle habits and health conditions were collected. Voice recordings of sustained vowels, and automatic and spontaneous speech were held over each trimester and analyzed by PRAAT®to evaluate acoustic, aerodynamic, and articulatory measures. RESULTS There were no changes in fundamental frequency, jitter, shimmer, and harmony to noise ratio during pregnancy. Maximum phonation time (MPT), pause rate, and pause duration reduced at the end of pregnancy. MPT was lower in sedentary pregnant women. The fundamental frequency peak rate was higher in eutrophic participants and lower in the third trimester in women with BMI ≥25 kg/m2. Pause rate was higher in pregnant women with BMI ≥25 kg/m2. There was no relationship between sleep quality, reflux, and vocal symptoms and acoustic and aerodynamic measures. CONCLUSIONS Differences were shown in MPT and temporal pause measurements during pregnancy. Acoustic measurements did not change. There was a relationship between acoustic and aerodynamic measures and clinical variables (BMI, physical activity, and body mass gain).
Collapse
Affiliation(s)
- Leila Rechenberg
- Graduate Program of Health Science: Obstetrics and Gynecology, School of Medicine, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil; Undergraduate Program of Speech and Language Therapy, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil; Department of Social and Preventive Dentistry, School of Dentistry, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil.
| | - Eliséa Maria Meurer
- Graduate Program of Health Science: Obstetrics and Gynecology, School of Medicine, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | - Monica Melos
- Graduate Program of Health Science: Obstetrics and Gynecology, School of Medicine, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil; Undergraduate Program of Speech and Language Therapy, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | - Otto Henrique Nienov
- Graduate Program of Health Science: Obstetrics and Gynecology, School of Medicine, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | - Helena von Eye Corleta
- Graduate Program of Health Science: Obstetrics and Gynecology, School of Medicine, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil; Department of Obstetrics and Gynecology, Hospital de Clínicas de Porto Alegre, School of Medicine, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| | - Edison Capp
- Graduate Program of Health Science: Obstetrics and Gynecology, School of Medicine, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil; Department of Obstetrics and Gynecology, Hospital de Clínicas de Porto Alegre, School of Medicine, Universidade Federal do Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| |
Collapse
|
5
|
Lee JJ, Tin JAA, Perrachione TK. Foreign language talker identification does not generalize to new talkers. Psychon Bull Rev 2024:10.3758/s13423-024-02598-x. [PMID: 39441473 DOI: 10.3758/s13423-024-02598-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/26/2024] [Indexed: 10/25/2024]
Abstract
Listeners identify talkers less accurately in a foreign language than in their native language, but it remains unclear whether this language-familiarity effect arises because listeners (1) simply lack experience identifying foreign-language talkers or (2) gain access to additional talker-specific information during concurrent linguistic processing of talkers' speech. Here, we tested whether sustained practice identifying talkers of an unfamiliar, foreign language could lead to generalizable improvement in learning to identify new talkers speaking that language, even if listeners remained unable to understand the talkers' speech. English-speaking adults with no prior experience with Mandarin practiced learning to identify Mandarin-speaking talkers over four consecutive days and were tested on their ability to generalize their Mandarin talker-identification abilities to new Mandarin-speaking talkers on the fourth day. In a "same-voices" training condition, listeners learned to identify the same talkers for the first 3 days and new talkers on the fourth day; in a "different-voices" condition, listeners learned to identify a different set of voices on each day including the fourth day. Listeners in the same-voices condition showed daily improvement in talker identification across the first 3 days but returned to baseline when trying to learn new talkers on the fourth day, whereas listeners in the different-voices condition showed no improvement across the 4 days. After 4 days, neither group demonstrated generalized improvement in learning new Mandarin-speaking talkers versus their baseline performance. These results suggest that, in the absence of specific linguistic knowledge, listeners are unable to develop generalizable foreign-language talker-identification abilities.
Collapse
Affiliation(s)
- Jayden J Lee
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| | - Jessica A A Tin
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA.
| |
Collapse
|
6
|
Liu B, Lei J, Wischhoff OP, Smereka KA, Jiang JJ. Acoustic Character Governing Variation in Normal, Benign, and Malignant Voices. Folia Phoniatr Logop 2024:1-10. [PMID: 38981448 DOI: 10.1159/000540255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 07/04/2024] [Indexed: 07/11/2024] Open
Abstract
INTRODUCTION Benign and malignant vocal fold lesions (VFLs) are growths that occur on the vocal folds. However, the treatments for these two types of lesions differ significantly. Therefore, it is imperative to use a multidisciplinary approach to properly recognize suspicious lesions. This study aimed to determine the important acoustic characteristics specific to benign and malignant VFLs. METHODS The acoustic model of voice quality was utilized to measure various acoustic parameters in 157 participants, including individuals with normal, benign, and malignant conditions. The study comprised 62 female and 95 male participants (43 ± 10 years). Voice samples were collected at the Shanghai Eye, Ear, Nose, and Throat Hospital of Fudan University between May 2020 and July 2021. The acoustic variables of the participants were analyzed using Principal Component Analysis (PCA) to present important acoustic characteristics that are specific to normal vocal folds, benign VFLs, and malignant VFLs. The similarities and differences in acoustic factors were also studied for benign conditions including Reinke's edema, polyps, cysts, and leukoplakia. RESULTS Using the PCA method, the components that accounted for the variation in the data were identified, highlighting acoustic characteristics in the normal, benign, and malignant groups. The analysis indicated that coefficients of variation in root mean square energy were observed solely within the normal group. Coefficients of variation in pitch (F0) were found to be significant only in benign voices, while higher formant frequencies and their variability were identified as contributors to the acoustic variance within the malignant group. The presence of formant dispersion (FD) as a weighted factor in PCA was exclusively noted in individuals with Reinke's edema. The amplitude ratio between subharmonics and harmonics (SHR) and its coefficients of variation were evident exclusively in the polyps group. In the case of voices with cysts, both pitch (F0) and coefficients of variation for FD were observed to contribute to variations. Additionally, higher formant frequencies and their coefficients of variation played a role in the acoustic variance among voices of patients with leukoplakia. CONCLUSION Experimental evidence demonstrates the utility of the PCA method in the identification of vibrational alterations in the acoustic characteristics of voice affected by lesions. Furthermore, the PCA analysis has highlighted underlying acoustic differences between various conditions such as Reinke's edema, polyps, cysts, and leukoplakia. These findings can be used in the future to develop an automated malignant voice analysis algorithm, which will facilitate timely intervention and management of vocal fold conditions.
Collapse
Affiliation(s)
- Boquan Liu
- School of Humanities, Shanghai Jiao Tong University, Shanghai, China
| | - Jianhan Lei
- School of Humanities, Shanghai Jiao Tong University, Shanghai, China
| | - Owen P Wischhoff
- Division of Otolaryngology Head and Neck Surgery, Department of Surgery, University of Wisconsin, Madison, Wisconsin, USA
| | - Katerina A Smereka
- Division of Otolaryngology Head and Neck Surgery, Department of Surgery, University of Wisconsin, Madison, Wisconsin, USA
| | - Jack J Jiang
- Division of Otolaryngology Head and Neck Surgery, Department of Surgery, University of Wisconsin, Madison, Wisconsin, USA
| |
Collapse
|
7
|
Best V, Ahlstrom JB, Mason CR, Perrachione TK, Kidd G, Dubno JR. Talker change detection by listeners varying in age and hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2482-2491. [PMID: 38587430 PMCID: PMC11003761 DOI: 10.1121/10.0025539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 03/06/2024] [Accepted: 03/19/2024] [Indexed: 04/09/2024]
Abstract
Despite a vast literature on how speech intelligibility is affected by hearing loss and advanced age, remarkably little is known about the perception of talker-related information in these populations. Here, we assessed the ability of listeners to detect whether a change in talker occurred while listening to and identifying sentence-length sequences of words. Participants were recruited in four groups that differed in their age (younger/older) and hearing status (normal/impaired). The task was conducted in quiet or in a background of same-sex two-talker speech babble. We found that age and hearing loss had detrimental effects on talker change detection, in addition to their expected effects on word recognition. We also found subtle differences in the effects of age and hearing loss for trials in which the talker changed vs trials in which the talker did not change. These findings suggest that part of the difficulty encountered by older listeners, and by listeners with hearing loss, when communicating in group situations, may be due to a reduced ability to identify and discriminate between the participants in the conversation.
Collapse
Affiliation(s)
- Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Jayne B Ahlstrom
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Gerald Kidd
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| |
Collapse
|
8
|
Chung HR, Lee Y, Reddy NK, Zhang Z, Chhetri DK. Effects of Thyroarytenoid Activation Induced Vibratory Asymmetry on Voice Acoustics and Perception. Laryngoscope 2024; 134:1327-1332. [PMID: 37676064 DOI: 10.1002/lary.31046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 06/25/2023] [Accepted: 08/22/2023] [Indexed: 09/08/2023]
Abstract
INTRODUCTION Asymmetry of vocal fold (VF) vibration is common in patients with voice complaints and also observed in 10% of normophonic individuals. Although thyroarytenoid (TA) muscle activation plays a crucial role in regulating VF vibration, how TA activation asymmetry relates to voice acoustics and perception is unclear. We evaluated the relationship between TA activation asymmetry and the resulting acoustics and perception. METHODS An in vivo canine model of phonation was used to create symmetric and increasingly asymmetric VF vibratory conditions via graded stimulation of bilateral TA muscles. Naïve listeners (n = 89) rated the perceptual quality of 100 unique voice samples using a visual sort-and-rate task. For each phonatory condition, cepstral peak prominence (CPP), harmonic amplitude (H1-H2), and root-mean-square (RMS) energy of the voice were measured. The relationships between these metrics, vibratory asymmetry, and perceptual ratings were evaluated. RESULTS Increasing levels of TA asymmetry resulted in declining listener preference. Furthermore, only severely asymmetric audio samples were perceptually distinguishable from symmetric and mildly asymmetric conditions. CPP was negatively correlated with TA asymmetry: voices produced with larger degrees of asymmetry were associated with lower CPP values. Listeners preferred audio samples with higher values of CPP, high RMS energy, and lower H1-H2 (less breathy). CONCLUSION Listeners are sensitive to changes in voice acoustics related to vibratory asymmetry. Although increasing vibratory asymmetry is correlated with decreased perceptual ratings, mild asymmetries are perceptually tolerated. This study contributes to our understanding of voice production and quality by identifying perceptually salient and clinically meaningful asymmetry. LEVEL OF EVIDENCE N/A (Basic Science Study) Laryngoscope, 134:1327-1332, 2024.
Collapse
Affiliation(s)
- Hye Rhyn Chung
- David Geffen School of Medicine at UCLA, 10833 Le Conte Avenue, Los Angeles, California, U.S.A
| | - Yoonjeong Lee
- Department of Head & Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California, U.S.A
- Department of Linguistics, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - Neha K Reddy
- David Geffen School of Medicine at UCLA, 10833 Le Conte Avenue, Los Angeles, California, U.S.A
| | - Zhaoyan Zhang
- Department of Head & Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California, U.S.A
| | - Dinesh K Chhetri
- Department of Head & Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California, U.S.A
| |
Collapse
|
9
|
Kreiman J. Information conveyed by voice qualitya). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1264-1271. [PMID: 38345424 DOI: 10.1121/10.0024609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 01/09/2024] [Indexed: 02/15/2024]
Abstract
The problem of characterizing voice quality has long caused debate and frustration. The richness of the available descriptive vocabulary is overwhelming, but the density and complexity of the information voices convey lead some to conclude that language can never adequately specify what we hear. Others argue that terminology lacks an empirical basis, so that language-based scales are inadequate a priori. Efforts to provide meaningful instrumental characterizations have also had limited success. Such measures may capture sound patterns but cannot at present explain what characteristics, intentions, or identity listeners attribute to the speaker based on those patterns. However, some terms continually reappear across studies. These terms align with acoustic dimensions accounting for variance across speakers and languages and correlate with size and arousal across species. This suggests that labels for quality rest on a bedrock of biology: We have evolved to perceive voices in terms of size/arousal, and these factors structure both voice acoustics and descriptive language. Such linkages could help integrate studies of signals and their meaning, producing a truly interdisciplinary approach to the study of voice.
Collapse
Affiliation(s)
- Jody Kreiman
- Departments of Head and Neck Surgery and Linguistics, University of California, Los Angeles, Los Angeles, California 90095-1794, USA
| |
Collapse
|
10
|
Pautz N, McDougall K, Mueller-Johnson K, Nolan F, Paver A, Smith HMJ. Identifying unfamiliar voices: Examining the system variables of sample duration and parade size. Q J Exp Psychol (Hove) 2023; 76:2804-2822. [PMID: 36718784 PMCID: PMC10655699 DOI: 10.1177/17470218231155738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 01/13/2023] [Accepted: 01/17/2023] [Indexed: 02/01/2023]
Abstract
Voice identification parades can be unreliable due to the error-prone nature of earwitness responses. UK government guidelines recommend that voice parades should have nine voices, each played for 60 s. This makes parades resource-consuming to construct. In this article, we conducted two experiments to see if voice parade procedures could be simplified. In Experiment 1 (N = 271, 135 female), we investigated if reducing the duration of the voice samples on a nine-voice parade would negatively affect identification performance using both conventional logistic and signal detection approaches. In Experiment 2 (N = 270, 136 female), we first explored if the same sample duration conditions used in Experiment 1 would lead to different outcomes if we reduced the parade size to include only six voices. Following this, we pooled the data from both experiments to investigate the influence of target-position effects. The results show that 15-s sample durations result in statistically equivalent voice identification performance to the longer 60-s sample durations, but that the 30-s sample duration suffers in terms of overall signal sensitivity. This pattern of results was replicated using both a nine- and a six-voice parade. Performance on target-absent parades were at chance levels in both parade sizes, and response criteria were mostly liberal. In addition, unwanted position effects were present. The results provide initial evidence that the sample duration used in a voice parade may be reduced, but we argue that the guidelines recommending a parade with nine voices should be maintained to provide additional protection for a potentially innocent suspect given the low target-absent accuracy.
Collapse
|
11
|
Song J, Kim M, Park J. Acoustic correlates of perceived personality from Korean utterances in a formal communicative setting. PLoS One 2023; 18:e0293222. [PMID: 37906609 PMCID: PMC10617731 DOI: 10.1371/journal.pone.0293222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 10/05/2023] [Indexed: 11/02/2023] Open
Abstract
The aim of the present study was to find acoustic correlates of perceived personality from the speech produced in a formal communicative setting-that of Korean customer service employees in particular. This work extended previous research on voice personality impressions to a different sociocultural and linguistic context in which speakers are expected to speak politely in a formal register. To use naturally produced speech rather than read speech, we devised a new method that successfully elicited spontaneous speech from speakers who were role-playing as customer service employees, while controlling for the words and sentence structures they used. We then examined a wide range of acoustic properties in the utterances, including voice quality and global acoustic and segmental properties using Principal Component Analysis. Subjects of the personality rating task listened to the utterances and rated perceived personality in terms of the Big-Five personality traits. While replicating some previous findings, we discovered several acoustic variables that exclusively accounted for the personality judgments of female speakers; a more modal voice quality increased perceived conscientiousness and neuroticism, and less dispersed formants reflecting a larger body size increased the perceived levels of extraversion and openness. These biases in personality perception likely reflect gender and occupation-related stereotypes that exist in South Korea. Our findings can also serve as a basis for developing and evaluating synthetic speech for Voice Assistant applications in future studies.
Collapse
Affiliation(s)
- Jieun Song
- School of Digital Humanities and Computational Social Sciences, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | - Minjeong Kim
- Graduate School of Culture Technology, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | - Jaehan Park
- KT Corporation, Seongnam-City, South Korea
- School of Computer Science, University of Seoul, Seoul, South Korea
| |
Collapse
|
12
|
Patel RR, Sandage MJ, Golzarri-Arroyo L. High-Speed Videoendoscopic and Acoustic Characteristics of Inspiratory Phonation. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1192-1207. [PMID: 36917802 DOI: 10.1044/2022_jslhr-22-00502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE Given the importance of inspiratory phonation for assessment of vocal fold structure, the aim of this investigation was to evaluate and describe the vocal fold vibratory characteristics of inspiratory phonation using high-speed videoendoscopy in healthy volunteers. The study also examined the empirical relationship between cepstral peak prominence (CPP) and glottal area waveform measurements derived from simultaneous high-speed videoendoscopy and audio recordings. METHOD Vocally healthy adults (33 women, 28 men) volunteered for this investigation and completed high-speed videoendoscopic assessment of vocal fold function for two trials of an expiratory/inspiratory phonation task at normal pitch and normal loudness. Twelve glottal area waveform measures and acoustic CPP values were extracted for analyses. RESULTS Inspiratory phonation resulted in shorter closing time, longer duration of the opening phase, and faster closing phase velocity compared to expiratory phonation. Sex differences were elucidated. CPP changes for inspiratory phonation were predicted by changes in the glottal area index and waveform symmetry index, whereas changes in CPP during expiratory phonation were predicted by changes in asymmetry quotient, glottal area index, and amplitude periodicity. CONCLUSIONS Vocal fold vibratory differences were identified for inspiratory phonation when compared to expiratory phonation, the latter of which has been studied more extensively. This investigation provides important basic inspiratory phonation data to better understand laryngeal physiology in vivo and provides a basic model from which to further study inspiratory phonation in a larger population representing a broader age range. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.22223812.
Collapse
Affiliation(s)
- Rita R Patel
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington
| | - Mary J Sandage
- Department of Speech, Language & Hearing Sciences, Auburn University, AL
| | | |
Collapse
|
13
|
Zhang Z. Vocal Fold Vertical Thickness in Human Voice Production and Control: A Review. J Voice 2023:S0892-1997(23)00078-4. [PMID: 36964073 PMCID: PMC10514229 DOI: 10.1016/j.jvoice.2023.02.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 03/26/2023]
Abstract
While current voice research often focuses on laryngeal adjustments in a two-dimensional plane from a superior endoscopic view, recent computational simulations showed that vocal control is three-dimensional and the medial surface vertical thickness plays an important role in regulating the glottal closure pattern and the spectral shape of the produced voice. In contrast, while a small glottal gap is required to initiate and sustain phonation, further changes in the glottal gap within this small range have only small effects on glottal closure and spectral shape. Vocal fold stiffness, particularly along the anterior-posterior direction, plays an important role in pitch control but has only a small effect on glottal closure and spectral shape. These results suggest that voice research should pay more attention to medial surface shape in the vertical dimension. Future studies in a large population of both normal speakers and patients are needed to better characterize the three-dimensional medial surface shape, its variability between speakers, changes throughout the life span, and how it is impacted by voice disorders and clinical interventions. The implications for voice pedagogy and clinical intervention are discussed.
Collapse
|
14
|
Kapadia AM, Tin JAA, Perrachione TK. Multiple sources of acoustic variation affect speech processing efficiency. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:209. [PMID: 36732274 PMCID: PMC9836727 DOI: 10.1121/10.0016611] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 11/14/2022] [Accepted: 12/07/2022] [Indexed: 05/29/2023]
Abstract
Phonetic variability across talkers imposes additional processing costs during speech perception, evident in performance decrements when listening to speech from multiple talkers. However, within-talker phonetic variation is a less well-understood source of variability in speech, and it is unknown how processing costs from within-talker variation compare to those from between-talker variation. Here, listeners performed a speeded word identification task in which three dimensions of variability were factorially manipulated: between-talker variability (single vs multiple talkers), within-talker variability (single vs multiple acoustically distinct recordings per word), and word-choice variability (two- vs six-word choices). All three sources of variability led to reduced speech processing efficiency. Between-talker variability affected both word-identification accuracy and response time, but within-talker variability affected only response time. Furthermore, between-talker variability, but not within-talker variability, had a greater impact when the target phonological contrasts were more similar. Together, these results suggest that natural between- and within-talker variability reflect two distinct magnitudes of common acoustic-phonetic variability: Both affect speech processing efficiency, but they appear to have qualitatively and quantitatively unique effects due to differences in their potential to obscure acoustic-phonemic correspondences across utterances.
Collapse
Affiliation(s)
- Alexandra M Kapadia
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jessica A A Tin
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
15
|
Cheung S, Babel M. The own-voice benefit for word recognition in early bilinguals. Front Psychol 2022; 13:901326. [PMID: 36118470 PMCID: PMC9478475 DOI: 10.3389/fpsyg.2022.901326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 07/28/2022] [Indexed: 11/15/2022] Open
Abstract
The current study examines the self-voice benefit in an early bilingual population. Female Cantonese-English bilinguals produced words containing Cantonese contrasts. A subset of these minimal pairs was selected as stimuli for a perception task. Speakers' productions were grouped according to how acoustically contrastive their pronunciation of each minimal pair was and these groupings were used to design personalized experiments for each participant, featuring their own voice and the voices of others' similarly-contrastive tokens. The perception task was a two-alternative forced-choice word identification paradigm in which participants heard isolated Cantonese words, which had undergone synthesis to mask the original talker identity. Listeners were more accurate in recognizing minimal pairs produced in their own (disguised) voice than recognizing the realizations of speakers who maintain similar degrees of phonetic contrast for the same minimal pairs. Generally, individuals with larger phonetic contrasts were also more accurate in word identification for self and other voices overall. These results provide evidence for an own-voice benefit for early bilinguals. These results suggest that the phonetic distributions that undergird phonological contrasts are heavily shaped by one's own phonetic realizations.
Collapse
Affiliation(s)
- Sarah Cheung
- Department of Speech-Language Pathology, University of Toronto, Toronto, ON, Canada
| | - Molly Babel
- Department of Linguistics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
16
|
Lee Y, Kreiman J. Acoustic voice variation in spontaneous speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3462. [PMID: 35649890 PMCID: PMC9135459 DOI: 10.1121/10.0011471] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
This study replicates and extends the recent findings of Lee, Keating, and Kreiman [J. Acoust. Soc. Am. 146(3), 1568-1579 (2019)] on acoustic voice variation in read speech, which showed remarkably similar acoustic voice spaces for groups of female and male talkers and the individual talkers within these groups. Principal component analysis was applied to acoustic indices of voice quality measured from phone conversations for 99/100 of the same talkers studied previously. The acoustic voice spaces derived from spontaneous speech are highly similar to those based on read speech, except that unlike read speech, variability in fundamental frequency accounted for significant acoustic variability. Implications of these findings for prototype models of speaker recognition and discrimination are considered.
Collapse
Affiliation(s)
- Yoonjeong Lee
- Department of Head and Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California 90095-1794, USA
| | - Jody Kreiman
- Department of Head and Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California 90095-1794, USA
| |
Collapse
|
17
|
Afshan A, Kreiman J, Alwan A. Speaker discrimination performance for "easy" versus "hard" voices in style-matched and -mismatched speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1393. [PMID: 35232083 PMCID: PMC8888001 DOI: 10.1121/10.0009585] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 01/22/2022] [Accepted: 01/28/2022] [Indexed: 05/19/2023]
Abstract
This study compares human speaker discrimination performance for read speech versus casual conversations and explores differences between unfamiliar voices that are "easy" versus "hard" to "tell together" versus "tell apart." Thirty listeners were asked whether pairs of short style-matched or -mismatched, text-independent utterances represented the same or different speakers. Listeners performed better when stimuli were style-matched, particularly in read speech-read speech trials (equal error rate, EER, of 6.96% versus 15.12% in conversation-conversation trials). In contrast, the EER was 20.68% for the style-mismatched condition. When styles were matched, listeners' confidence was higher when speakers were the same versus different; however, style variation caused decreases in listeners' confidence for the "same speaker" trials, suggesting a higher dependency of this task on within-speaker variability. The speakers who were "easy" or "hard" to "tell together" were not the same as those who were "easy" or "hard" to "tell apart." Analysis of speaker acoustic spaces suggested that the difference observed in human approaches to "same speaker" and "different speaker" tasks depends primarily on listeners' different perceptual strategies when dealing with within- versus between-speaker acoustic variability.
Collapse
Affiliation(s)
- Amber Afshan
- Department of Electrical and Computer Engineering, University of California, Los Angeles, California 90095-1594, USA
| | - Jody Kreiman
- Departments of Head and Neck Surgery and Linguistics, University of California, Los Angeles, California 90095-1794, USA
| | - Abeer Alwan
- Department of Electrical and Computer Engineering, University of California, Los Angeles, California 90095-1594, USA
| |
Collapse
|
18
|
Bradshaw AR, McGettigan C. Convergence in voice fundamental frequency during synchronous speech. PLoS One 2021; 16:e0258747. [PMID: 34673811 PMCID: PMC8530294 DOI: 10.1371/journal.pone.0258747] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 10/04/2021] [Indexed: 11/24/2022] Open
Abstract
Joint speech behaviours where speakers produce speech in unison are found in a variety of everyday settings, and have clinical relevance as a temporary fluency-enhancing technique for people who stutter. It is currently unknown whether such synchronisation of speech timing among two speakers is also accompanied by alignment in their vocal characteristics, for example in acoustic measures such as pitch. The current study investigated this by testing whether convergence in voice fundamental frequency (F0) between speakers could be demonstrated during synchronous speech. Sixty participants across two online experiments were audio recorded whilst reading a series of sentences, first on their own, and then in synchrony with another speaker (the accompanist) in a number of between-subject conditions. Experiment 1 demonstrated significant convergence in participants' F0 to a pre-recorded accompanist voice, in the form of both upward (high F0 accompanist condition) and downward (low and extra-low F0 accompanist conditions) changes in F0. Experiment 2 demonstrated that such convergence was not seen during a visual synchronous speech condition, in which participants spoke in synchrony with silent video recordings of the accompanist. An audiovisual condition in which participants were able to both see and hear the accompanist in pre-recorded videos did not result in greater convergence in F0 compared to synchronisation with the pre-recorded voice alone. These findings suggest the need for models of speech motor control to incorporate interactions between self- and other-speech feedback during speech production, and suggest a novel hypothesis for the mechanisms underlying the fluency-enhancing effects of synchronous speech in people who stutter.
Collapse
Affiliation(s)
- Abigail R. Bradshaw
- Department of Speech, Hearing & Phonetic Sciences, University College London, London, United Kingdom
| | - Carolyn McGettigan
- Department of Speech, Hearing & Phonetic Sciences, University College London, London, United Kingdom
| |
Collapse
|
19
|
Abstract
The ability to identify individuals by voice is fundamental for communication. However, little is known about the expectations that infants hold when learning unfamiliar voices. Here, the voice-learning skills of 4- and 8-month-olds (N = 53; 29 girls, 14 boys of various ethnicities) were tested using a preferential-looking task that involved audiovisual stimuli of their mothers and other unfamiliar women. Findings reveal that the expectation that novel voices map on to novel faces emerges between 4 and 8 months of age, and that infants can retain learning of face-voice pairings via nonostensive cues by 8 months of age. This study provides new insights about infants' use of disambiguation and fast mapping in early voice learning.
Collapse
|
20
|
Alvarez-Alonso MJ, de-la-Peña C, Ortega Z, Scott R. Boys-Specific Text-Comprehension Enhancement With Dual Visual-Auditory Text Presentation Among 12-14 Years-Old Students. Front Psychol 2021; 12:574685. [PMID: 33897513 PMCID: PMC8062718 DOI: 10.3389/fpsyg.2021.574685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Accepted: 03/12/2021] [Indexed: 11/13/2022] Open
Abstract
Quality of language comprehension determines performance in all kinds of activities including academics. Processing of words initially develops as auditory, and gradually extends to visual as children learn to read. School failure is highly related to listening and reading comprehension problems. In this study we analyzed sex-differences in comprehension of texts in Spanish (standardized reading test PROLEC-R) in three modalities (visual, auditory, and both simultaneously: dual-modality) presented to 12-14-years old students, native in Spanish. We controlled relevant cognitive variables such as attention (d2), phonological and semantic fluency (FAS) and speed of processing (WISC subtest Coding). Girls' comprehension was similar in the three modalities of presentation, however boys were importantly benefited by dual-modality as compared to boys exposed only to visual or auditory text presentation. With respect to the relation of text comprehension and school performance, students with low grades in Spanish showed low auditory comprehension. Interestingly, visual and dual modalities preserved comprehension levels in these low skilled students. Our results suggest that the use of visual-text support during auditory language presentation could be beneficial for low school performance students, especially boys, and encourage future research to evaluate the implementation in classes of the rapidly developing technology of simultaneous speech transcription, that could be, in addition, beneficial to non-native students, especially those recently incorporated into school or newly arrived in a country from abroad.
Collapse
Affiliation(s)
- Maria Jose Alvarez-Alonso
- Departamento de Psicología Evolutiva y Psicobiología, Universidad Internacional de la Rioja, Logroño, Spain
| | - Cristina de-la-Peña
- Departamento de Psicología Evolutiva y Psicobiología, Universidad Internacional de la Rioja, Logroño, Spain
| | - Zaira Ortega
- Departamento de Psicología Evolutiva y Psicobiología, Universidad Internacional de la Rioja, Logroño, Spain
| | - Ricardo Scott
- Departamento de Psicología Evolutiva y Psicobiología, Universidad Internacional de la Rioja, Logroño, Spain.,Departamento de Psicología Evolutiva y Didáctica, Universidad de Alicante, Alicante, Spain
| |
Collapse
|
21
|
Nguyen DD, McCabe P, Thomas D, Purcell A, Doble M, Novakovic D, Chacon A, Madill C. Acoustic voice characteristics with and without wearing a facemask. Sci Rep 2021; 11:5651. [PMID: 33707509 PMCID: PMC7970997 DOI: 10.1038/s41598-021-85130-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 02/19/2021] [Indexed: 01/31/2023] Open
Abstract
Facemasks are essential for healthcare workers but characteristics of the voice whilst wearing this personal protective equipment are not well understood. In the present study, we compared acoustic voice measures in recordings of sixteen adults producing standardised vocal tasks with and without wearing either a surgical mask or a KN95 mask. Data were analysed for mean spectral levels at 0-1 kHz and 1-8 kHz regions, an energy ratio between 0-1 and 1-8 kHz (LH1000), harmonics-to-noise ratio (HNR), smoothed cepstral peak prominence (CPPS), and vocal intensity. In connected speech there was significant attenuation of mean spectral level at 1-8 kHz region and there was no significant change in this measure at 0-1 kHz. Mean spectral levels of vowel did not change significantly in mask-wearing conditions. LH1000 for connected speech significantly increased whilst wearing either a surgical mask or KN95 mask but no significant change in this measure was found for vowel. HNR was higher in the mask-wearing conditions than the no-mask condition. CPPS and vocal intensity did not change in mask-wearing conditions. These findings implied an attenuation effects of wearing these types of masks on the voice spectra with surgical mask showing less impact than the KN95.
Collapse
Affiliation(s)
- Duy Duong Nguyen
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Patricia McCabe
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Donna Thomas
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Alison Purcell
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Maree Doble
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Daniel Novakovic
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Antonia Chacon
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| | - Catherine Madill
- grid.1013.30000 0004 1936 834XVoice Research Laboratory, Faculty of Medicine and Health, D18, Susan Wakil Health Building, Camperdown Campus, The University of Sydney, Western Avenue, Sydney, NSW 2006 Australia
| |
Collapse
|
22
|
Kreiman J, Lee Y, Garellek M, Samlan R, Gerratt BR. Validating a psychoacoustic model of voice quality. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:457. [PMID: 33514179 PMCID: PMC7822631 DOI: 10.1121/10.0003331] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 12/07/2020] [Accepted: 12/16/2020] [Indexed: 05/19/2023]
Abstract
No agreed-upon method currently exists for objective measurement of perceived voice quality. This paper describes validation of a psychoacoustic model designed to fill this gap. This model includes parameters to characterize the harmonic and inharmonic voice sources, vocal tract transfer function, fundamental frequency, and amplitude of the voice, which together serve to completely quantify the integral sound of a target voice sample. In experiment 1, 200 voices with and without diagnosed vocal pathology were fit with the model using analysis-by-synthesis. The resulting synthetic voice samples were not distinguishable from the original voice tokens, suggesting that the model has all the parameters it needs to fully quantify voice quality. In experiment 2 parameters that model the harmonic voice source were removed one by one, and the voice tokens were re-synthesized with the reduced model. In every case the lower-dimensional models provided worse perceptual matches to the quality of the natural tokens than did the original set, indicating that the psychoacoustic model cannot be reduced in dimensionality without loss of fit to the data. Results confirm that this model can be validly applied to quantify voice quality in clinical and research applications.
Collapse
Affiliation(s)
- Jody Kreiman
- Departments of Head and Neck Surgery and Linguistics, University of California-Los Angeles, Los Angeles, California 90095-1794, USA
| | - Yoonjeong Lee
- Departments of Head and Neck Surgery and Linguistics, University of California-Los Angeles, Los Angeles, California 90095-1794, USA
| | - Marc Garellek
- Department of Linguistics, University of California-San Diego, San Diego, California 92093-0108, USA
| | - Robin Samlan
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Bruce R Gerratt
- Department of Head and Neck Surgery, University of California-Los Angeles School of Medicine, Los Angeles, California 90095-1794, USA
| |
Collapse
|
23
|
Heeren WFL. The effect of word class on speaker-dependent information in the Standard Dutch vowel /aː/. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2028. [PMID: 33138546 DOI: 10.1121/10.0002173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 09/24/2020] [Indexed: 06/11/2023]
Abstract
Linguistic structure co-determines how a speech sound is produced. This study therefore investigated whether the speaker-dependent information in the vowel [aː] varies when uttered in different word classes. From two spontaneous speech corpora, [aː] tokens were sampled and annotated for word class (content, function word). This was done for 50 male adult speakers of Standard Dutch in face-to-face speech (N = 3128 tokens), and another 50 male adult speakers in telephone speech (N = 3136 tokens). First, the effect of word class on various acoustic variables in spontaneous speech was tested. Results showed that [aː]'s were shorter and more centralized in function than content words. Next, tokens were used to assess their speaker-dependent information as a function of word class, by using acoustic-phonetic variables to (a) build speaker classification models and (b) compute the strength-of-evidence, a technique from forensic phonetics. Speaker-classification performance was somewhat better for content than function words, whereas forensic strength-of-evidence was comparable between the word classes. This seems explained by how these methods weigh between- and within-speaker variation. Because these two sources of variation co-varied in size with word class, acoustic word-class variation is not expected to affect the sampling of tokens in forensic speaker comparisons.
Collapse
Affiliation(s)
- Willemijn F L Heeren
- Leiden University Centre for Linguistics, Leiden University, Reuvensplaats 3-4, 2311 BE Leiden, the Netherlands
| |
Collapse
|
24
|
Young AW, Frühholz S, Schweinberger SR. Face and Voice Perception: Understanding Commonalities and Differences. Trends Cogn Sci 2020; 24:398-410. [DOI: 10.1016/j.tics.2020.02.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 01/16/2020] [Accepted: 02/03/2020] [Indexed: 01/01/2023]
|