1
|
Nagels L, Gaudrain E, Vickers D, Hendriks P, Başkent D. Prelingually Deaf Children With Cochlear Implants Show Better Perception of Voice Cues and Speech in Competing Speech Than Postlingually Deaf Adults With Cochlear Implants. Ear Hear 2024; 45:952-968. [PMID: 38616318 PMCID: PMC11175806 DOI: 10.1097/aud.0000000000001489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 01/10/2024] [Indexed: 04/16/2024]
Abstract
OBJECTIVES Postlingually deaf adults with cochlear implants (CIs) have difficulties with perceiving differences in speakers' voice characteristics and benefit little from voice differences for the perception of speech in competing speech. However, not much is known yet about the perception and use of voice characteristics in prelingually deaf implanted children with CIs. Unlike CI adults, most CI children became deaf during the acquisition of language. Extensive neuroplastic changes during childhood could make CI children better at using the available acoustic cues than CI adults, or the lack of exposure to a normal acoustic speech signal could make it more difficult for them to learn which acoustic cues they should attend to. This study aimed to examine to what degree CI children can perceive voice cues and benefit from voice differences for perceiving speech in competing speech, comparing their abilities to those of normal-hearing (NH) children and CI adults. DESIGN CI children's voice cue discrimination (experiment 1), voice gender categorization (experiment 2), and benefit from target-masker voice differences for perceiving speech in competing speech (experiment 3) were examined in three experiments. The main focus was on the perception of mean fundamental frequency (F0) and vocal-tract length (VTL), the primary acoustic cues related to speakers' anatomy and perceived voice characteristics, such as voice gender. RESULTS CI children's F0 and VTL discrimination thresholds indicated lower sensitivity to differences compared with their NH-age-equivalent peers, but their mean discrimination thresholds of 5.92 semitones (st) for F0 and 4.10 st for VTL indicated higher sensitivity than postlingually deaf CI adults with mean thresholds of 9.19 st for F0 and 7.19 st for VTL. Furthermore, CI children's perceptual weighting of F0 and VTL cues for voice gender categorization closely resembled that of their NH-age-equivalent peers, in contrast with CI adults. Finally, CI children had more difficulties in perceiving speech in competing speech than their NH-age-equivalent peers, but they performed better than CI adults. Unlike CI adults, CI children showed a benefit from target-masker voice differences in F0 and VTL, similar to NH children. CONCLUSION Although CI children's F0 and VTL voice discrimination scores were overall lower than those of NH children, their weighting of F0 and VTL cues for voice gender categorization and their benefit from target-masker differences in F0 and VTL resembled that of NH children. Together, these results suggest that prelingually deaf implanted CI children can effectively utilize spectrotemporally degraded F0 and VTL cues for voice and speech perception, generally outperforming postlingually deaf CI adults in comparable tasks. These findings underscore the presence of F0 and VTL cues in the CI signal to a certain degree and suggest other factors contributing to the perception challenges faced by CI adults.
Collapse
Affiliation(s)
- Leanne Nagels
- Center for Language and Cognition Groningen (CLCG), University of Groningen, Groningen, The Netherlands
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Groningen, The Netherlands
| | - Etienne Gaudrain
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Groningen, The Netherlands
- CNRS UMR 5292, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Inserm UMRS 1028, Université Claude Bernard Lyon 1, Université de Lyon, Lyon, France
| | - Deborah Vickers
- Cambridge Hearing Group, Sound Lab, Clinical Neurosciences Department, University of Cambridge, Cambridge, United Kingdom
| | - Petra Hendriks
- Center for Language and Cognition Groningen (CLCG), University of Groningen, Groningen, The Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Groningen, The Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Groningen, The Netherlands
- W.J. Kolff Institute for Biomedical Engineering and Materials Science, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
2
|
Basu N, Weber P, Bali AS, Rosas-Aguilar C, Edmond G, Martire KA, Morrison GS. Speaker identification in courtroom contexts - Part II: Investigation of bias in individual listeners' responses. Forensic Sci Int 2023; 349:111768. [PMID: 37392611 DOI: 10.1016/j.forsciint.2023.111768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 05/18/2023] [Accepted: 06/20/2023] [Indexed: 07/03/2023]
Abstract
In "Speaker identification in courtroom contexts - Part I" individual listeners made speaker-identification judgements on pairs of recordings which reflected the conditions of the questioned-speaker and known-speaker recordings in a real case. The recording conditions were poor, and there was a mismatch between the questioned-speaker condition and the known-speaker condition. No contextual information that could potentially bias listeners' responses was included in the experiment condition - it was decontextualized with respect to case circumstances and with respect to other evidence that could be presented in the context of a case. Listeners' responses exhibited a bias in favour of the different-speaker hypothesis. It was hypothesized that the bias was due to the poor and mismatched recording conditions. The present research compares speaker-identification performance between: (1) listeners under the original Part I experiment condition, (2) listeners who were informed ahead of time that the recording conditions would make the recordings sound more different from one another than had they both been high-quality recordings, and (3) listeners who were presented with high-quality versions of the recordings. Under all experiment conditions, there was a substantial bias in favour of the different-speaker hypothesis. The bias in favour of the different-speaker hypothesis therefore appears not to be due to the poor and mismatched recording conditions.
Collapse
Affiliation(s)
- Nabanita Basu
- Forensic Data Science Laboratory, Aston University, Birmingham, UK
| | - Philip Weber
- Forensic Data Science Laboratory, Aston University, Birmingham, UK
| | - Agnes S Bali
- School of Psychology, University of New South Wales, Sydney, New South Wales, Australia
| | - Claudia Rosas-Aguilar
- Instituto de Lingüística y Literatura, Universidad Austral de Chile, Valdivia, Chile
| | - Gary Edmond
- School of Law, Society & Criminology, University of New South Wales, Sydney, New South Wales, Australia
| | - Kristy A Martire
- School of Psychology, University of New South Wales, Sydney, New South Wales, Australia
| | - Geoffrey Stewart Morrison
- Forensic Data Science Laboratory, Aston University, Birmingham, UK; Forensic Evaluation Ltd, Birmingham, UK.
| |
Collapse
|
3
|
Donhauser PW, Klein D. Audio-Tokens: A toolbox for rating, sorting and comparing audio samples in the browser. Behav Res Methods 2023; 55:508-515. [PMID: 35297013 PMCID: PMC10027774 DOI: 10.3758/s13428-022-01803-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2022] [Indexed: 12/30/2022]
Abstract
Here we describe a JavaScript toolbox to perform online rating studies with auditory material. The main feature of the toolbox is that audio samples are associated with visual tokens on the screen that control audio playback and can be manipulated depending on the type of rating. This allows the collection of single- and multidimensional feature ratings, as well as categorical and similarity ratings. The toolbox ( github.com/pwdonh/audio_tokens ) can be used via a plugin for the widely used jsPsych, as well as using plain JavaScript for custom applications. We expect the toolbox to be useful in psychological research on speech and music perception, as well as for the curation and annotation of datasets in machine learning.
Collapse
Affiliation(s)
- Peter W Donhauser
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, H3A 2B4, Canada.
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, 60528, Frankfurt am Main, Germany.
| | - Denise Klein
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, H3A 2B4, Canada.
- Centre for Research on Brain, Language and Music, McGill University, Montreal, QC, H3G 2A8, Canada.
| |
Collapse
|
4
|
Njie S, Lavan N, McGettigan C. Talker and accent familiarity yield advantages for voice identity perception: A voice sorting study. Mem Cognit 2023; 51:175-187. [PMID: 35274221 PMCID: PMC9943951 DOI: 10.3758/s13421-022-01296-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/15/2022] [Indexed: 11/08/2022]
Abstract
In the current study, we examine and compare the effects of talker and accent familiarity in the context of a voice identity sorting task, using naturally varying voice recording samples from the TV show Derry Girls. Voice samples were thus all spoken with a regional accent of UK/Irish English (from [London]derry). We tested four listener groups: Listeners were either familiar or unfamiliar with the TV show (and therefore the talker identities) and were either highly familiar or relatively less familiar with Northern Irish accents. Both talker and accent familiarity significantly improved accuracy of voice identity sorting performance. However, the talker familiarity benefits were overall larger, and more consistent. We discuss the results in light of a possible hierarchy of familiarity effects and argue that our findings may provide additional evidence for interactions of speech and identity processing pathways in voice identity perception. We also identify some key limitations in the current work and provide suggestions for future studies to address these.
Collapse
Affiliation(s)
- Sheriff Njie
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House 2 Wakefield Street, London, WC1N 1PF, UK
| | - Nadine Lavan
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House 2 Wakefield Street, London, WC1N 1PF, UK.
- Department of Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London, E1 4NS, UK.
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House 2 Wakefield Street, London, WC1N 1PF, UK.
| |
Collapse
|
5
|
Unimodal and cross-modal identity judgements using an audio-visual sorting task: Evidence for independent processing of faces and voices. Mem Cognit 2021; 50:216-231. [PMID: 34254274 PMCID: PMC8763756 DOI: 10.3758/s13421-021-01198-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2021] [Indexed: 11/18/2022]
Abstract
Unimodal and cross-modal information provided by faces and voices contribute to identity percepts. To examine how these sources of information interact, we devised a novel audio-visual sorting task in which participants were required to group video-only and audio-only clips into two identities. In a series of three experiments, we show that unimodal face and voice sorting were more accurate than cross-modal sorting: While face sorting was consistently most accurate followed by voice sorting, cross-modal sorting was at chancel level or below. In Experiment 1, we compared performance in our novel audio-visual sorting task to a traditional identity matching task, showing that unimodal and cross-modal identity perception were overall moderately more accurate than the traditional identity matching task. In Experiment 2, separating unimodal from cross-modal sorting led to small improvements in accuracy for unimodal sorting, but no change in cross-modal sorting performance. In Experiment 3, we explored the effect of minimal audio-visual training: Participants were shown a clip of the two identities in conversation prior to completing the sorting task. This led to small, nonsignificant improvements in accuracy for unimodal and cross-modal sorting. Our results indicate that unfamiliar face and voice perception operate relatively independently with no evidence of mutual benefit, suggesting that extracting reliable cross-modal identity information is challenging.
Collapse
|
6
|
May I Speak Freely? The Difficulty in Vocal Identity Processing Across Free and Scripted Speech. JOURNAL OF NONVERBAL BEHAVIOR 2020. [DOI: 10.1007/s10919-020-00348-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractIn the fields of face recognition and voice recognition, a growing literature now suggests that the ability to recognize an individual despite changes from one instance to the next is a considerable challenge. The present paper reports on one experiment in the voice domain designed to determine whether a change in the mere style of speech may result in a measurable difficulty when trying to discriminate between speakers. Participants completed a speaker discrimination task to pairs of speech clips, which represented either free speech or scripted speech segments. The results suggested that speaker discrimination was significantly better when the style of speech did not change compared to when it did change, and was significantly better from scripted than from free speech segments. These results support the emergent body of evidence suggesting that within-identity variability is a challenge, and the forensic implications of such a mild change in speech style are discussed.
Collapse
|
7
|
Johnson J, McGettigan C, Lavan N. Comparing unfamiliar voice and face identity perception using identity sorting tasks. Q J Exp Psychol (Hove) 2020; 73:1537-1545. [PMID: 32530364 PMCID: PMC7534197 DOI: 10.1177/1747021820938659] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 02/11/2020] [Accepted: 03/03/2020] [Indexed: 11/16/2022]
Abstract
Identity sorting tasks, in which participants sort multiple naturally varying stimuli of usually two identities into perceived identities, have recently gained popularity in voice and face processing research. In both modalities, participants who are unfamiliar with the identities tend to perceive multiple stimuli of the same identity as different people and thus fail to "tell people together." These similarities across modalities suggest that modality-general mechanisms may underpin sorting behaviour. In this study, participants completed a voice sorting and a face sorting task. Taking an individual differences approach, we asked whether participants' performance on voice and face sorting of unfamiliar identities is correlated. Participants additionally completed a voice discrimination (Bangor Voice Matching Test) and a face discrimination task (Glasgow Face Matching Test). Using these tasks, we tested whether performance on sorting related to explicit identity discrimination. Performance on voice sorting and face sorting tasks was correlated, suggesting that common modality-general processes underpin these tasks. However, no significant correlations were found between sorting and discrimination performance, with the exception of significant relationships for performance on "same identity" trials with "telling people together" for voices and faces. Overall, any reported relationships were however relatively weak, suggesting the presence of additional modality-specific and task-specific processes.
Collapse
Affiliation(s)
- Justine Johnson
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Nadine Lavan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
8
|
Lavan N, Mileva M, McGettigan C. How does familiarity with a voice affect trait judgements? Br J Psychol 2020; 112:282-300. [PMID: 32445499 DOI: 10.1111/bjop.12454] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 05/05/2020] [Indexed: 11/27/2022]
Abstract
From only a single spoken word, listeners can form a wealth of first impressions of a person's character traits and personality based on their voice. However, due to the substantial within-person variability in voices, these trait judgements are likely to be highly stimulus-dependent for unfamiliar voices: The same person may sound very trustworthy in one recording but less trustworthy in another. How trait judgements differ when listeners are familiar with a voice is unclear: Are listeners who are familiar with the voices as susceptible to the effects of within-person variability? Does the semantic knowledge listeners have about a familiar person influence their judgements? In the current study, we tested the effect of familiarity on listeners' trait judgements from variable voices across 3 experiments. Using a between-subjects design, we contrasted trait judgements by listeners who were familiar with a set of voices - either through laboratory-based training or through watching a TV show - with listeners who were unfamiliar with the voices. We predicted that familiarity with the voices would reduce variability in trait judgements for variable voice recordings from the same identity (cf. Mileva, Kramer & Burton, Perception, 48, 471 and 2019, for faces). However, across the 3 studies and two types of measures to assess variability, we found no compelling evidence to suggest that trait impressions were systematically affected by familiarity.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Speech, Hearing and Phonetic Sciences, University College London, UK
| | - Mila Mileva
- Department of Psychology, University of York, UK.,School of Psychology, University of Plymouth, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, UK
| |
Collapse
|
9
|
Stevenage SV, Symons AE, Fletcher A, Coen C. Sorting through the impact of familiarity when processing vocal identity: Results from a voice sorting task. Q J Exp Psychol (Hove) 2019; 73:519-536. [PMID: 31658884 PMCID: PMC7074657 DOI: 10.1177/1747021819888064] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The present article reports on one experiment designed to examine the importance of familiarity when processing vocal identity. A voice sorting task was used with participants who were either personally familiar or unfamiliar with three speakers. The results suggested that familiarity supported both an ability to tell different instances of the same voice together, and to tell similar instances of different voices apart. In addition, the results suggested differences between the three speakers in terms of the extent to which they were confusable, underlining the importance of vocal characteristics and stimulus selection within behavioural tasks. The results are discussed with reference to existing debates regarding the nature of stored representations as familiarity develops, and the difficulty when processing voices over faces more generally.
Collapse
Affiliation(s)
| | - Ashley E Symons
- School of Psychology, University of Southampton, Southampton, UK
| | - Abi Fletcher
- School of Psychology, University of Southampton, Southampton, UK
| | - Chantelle Coen
- School of Psychology, University of Southampton, Southampton, UK
| |
Collapse
|
10
|
Perrachione TK, Furbeck KT, Thurston EJ. Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3384. [PMID: 31795676 PMCID: PMC7043842 DOI: 10.1121/1.5126697] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2018] [Revised: 08/18/2019] [Accepted: 09/03/2019] [Indexed: 05/20/2023]
Abstract
The human voice is a complex acoustic signal that conveys talker identity via individual differences in numerous features, including vocal source acoustics, vocal tract resonances, and dynamic articulations during speech. It remains poorly understood how differences in these features contribute to perceptual dissimilarity of voices and, moreover, whether linguistic differences between listeners and talkers interact during perceptual judgments of voices. Here, native English- and Mandarin-speaking listeners rated the perceptual dissimilarity of voices speaking English or Mandarin from either forward or time-reversed speech. The language spoken by talkers, but not listeners, principally influenced perceptual judgments of voices. Perceptual dissimilarity judgments of voices were always highly correlated between listener groups and forward/time-reversed speech. Representational similarity analyses that explored how acoustic features (fundamental frequency mean and variation, jitter, harmonics-to-noise ratio, speech rate, and formant dispersion) contributed to listeners' perceptual dissimilarity judgments, including how talker- and listener-language affected these relationships, found the largest effects relating to voice pitch. Overall, these data suggest that, while linguistic factors may influence perceptual judgments of voices, the magnitude of such effects tends to be very small. Perceptual judgments of voices by listeners of different native language backgrounds tend to be more alike than different.
Collapse
Affiliation(s)
- Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Kristina T Furbeck
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Emily J Thurston
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|