1
|
Kreiman J, Lee Y. Biological, linguistic, and individual factors govern voice qualitya). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:482-492. [PMID: 39846773 DOI: 10.1121/10.0034848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 12/17/2024] [Indexed: 01/24/2025]
Abstract
Voice quality serves as a rich source of information about speakers, providing listeners with impressions of identity, emotional state, age, sex, reproductive fitness, and other biologically and socially salient characteristics. Understanding how this information is transmitted, accessed, and exploited requires knowledge of the psychoacoustic dimensions along which voices vary, an area that remains largely unexplored. Recent studies of English speakers have shown that two factors related to speaker size and arousal consistently emerge as the most important determinants of quality, regardless of who is speaking. The present findings extend this picture by demonstrating that in four languages that vary fundamental frequency (fo) and/or phonation type contrastively (Korean, Thai, Gujarati, and White Hmong), additional acoustic variability is systematically related to the phonology of the language spoken, and the amount of variability along each dimension is consistent across speaker groups. This study concludes that acoustic voice spaces are structured in a remarkably consistent way: first by biologically driven, evolutionarily grounded factors, second by learned linguistic factors, and finally by variations within a talker over utterances, possibly due to personal style, emotional state, social setting, or other dynamic factors. Implications for models of speaker recognition are also discussed.
Collapse
Affiliation(s)
- Jody Kreiman
- Departments of Head and Neck Surgery and Linguistics, UCLA, Los Angeles, California 90095-1794, USA
| | - Yoonjeong Lee
- USC Viterbi School of Engineering, University of Southern California, Los Angeles, California 90089-1455, USA
| |
Collapse
|
2
|
Lavan N. Left-handed voices? Examining the perceptual learning of novel person characteristics from the voice. Q J Exp Psychol (Hove) 2024; 77:2325-2338. [PMID: 38229446 DOI: 10.1177/17470218241228849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024]
Abstract
We regularly form impressions of who a person is from their voice, such that we can readily categorise people as being female or male, child or adult, trustworthy or not, and can furthermore recognise who specifically is speaking. How we establish mental representations for such categories of person characteristics has, however, only been explored in detail for voice identity learning. In a series of experiments, we therefore set out to examine whether and how listeners can learn to recognise a novel person characteristic. We specifically asked how diagnostic acoustic properties underpinning category distinctions inform perceptual judgements. We manipulated recordings of voices to create acoustic signatures for a person's handedness (left-handed vs. right-handed) in their voice. After training, we found that listeners were able to successfully learn to recognise handedness from voices with above-chance accuracy, although no significant differences in accuracy between the different types of manipulation emerged. Listeners were, furthermore, sensitive to the specific distributions of acoustic properties that underpinned the category distinctions. We, however, also find evidence for perceptual biases that may reflect long-term prior exposure to how voices vary in naturalistic settings. These biases shape how listeners use acoustic information in the voices when forming representations for distinguishing handedness from voices. This study is thus a first step to examine how representations for novel person characteristics are established, outside of voice identity perception. We discuss our findings in light of theoretical accounts of voice perception and speculate about potential mechanisms that may underpin our results.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
| |
Collapse
|
3
|
Pinheiro AP, Aucouturier JJ, Kotz SA. Neural adaptation to changes in self-voice during puberty. Trends Neurosci 2024; 47:777-787. [PMID: 39214825 DOI: 10.1016/j.tins.2024.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 07/18/2024] [Accepted: 08/01/2024] [Indexed: 09/04/2024]
Abstract
The human voice is a potent social signal and a distinctive marker of individual identity. As individuals go through puberty, their voices undergo acoustic changes, setting them apart from others. In this article, we propose that hormonal fluctuations in conjunction with morphological vocal tract changes during puberty establish a sensitive developmental phase that affects the monitoring of the adolescent voice and, specifically, self-other distinction. Furthermore, the protracted maturation of brain regions responsible for voice processing, coupled with the dynamically evolving social environment of adolescents, likely disrupts a clear differentiation of the self-voice from others' voices. This socioneuroendocrine framework offers a holistic understanding of voice monitoring during adolescence.
Collapse
Affiliation(s)
- Ana P Pinheiro
- Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal.
| | | | - Sonja A Kotz
- Maastricht University, Maastricht, The Netherlands; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
4
|
Lloy L, Patil KN, Johnson KA, Babel M. Language-general versus language-specific processes in bilingual voice learning. Cognition 2024; 250:105866. [PMID: 38971020 DOI: 10.1016/j.cognition.2024.105866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 06/12/2024] [Accepted: 06/21/2024] [Indexed: 07/08/2024]
Abstract
Language experience confers a benefit to voice learning, a concept described in the literature as the language familiarity effect (LFE). What experiences are necessary for the LFE to be conferred is less clear. We contribute empirically and theoretically to this debate by examining within and across language voice learning with Cantonese-English bilingual voices in a talker-voice association paradigm. Listeners were trained in Cantonese or English and assessed on their abilities to generalize voice learning at test on Cantonese and English utterances. By testing listeners from four language backgrounds - English Monolingual, Cantonese-English Multilingual, Tone Multilingual, and Non-tone Multilingual groups - we assess whether the LFE and group-level differences in voice learning are due to varying abilities (1) in accessing the relative acoustic-phonetic features that distinguish a voice, (2) learning at a given rate, or (3) generalizing learning of talker-voice associations to novel same-language and different-language utterances. The specific four language background groups allow us to investigate the roles of language-specific familiarity, tone language experience, and generic multilingual experience in voice learning. Differences in performance across listener groups shows evidence in support of the LFE and the role of two mechanisms for voice learning: the extraction and association of talker-specific, language-general information that is more robustly generalized across languages, and talker-specific, language-specific information that may be more readily accessible and learnable, but due to its language-specific nature, is less able to be extended to another language.
Collapse
Affiliation(s)
- Line Lloy
- Department of Linguistics, University of British Columbia, Canada.
| | | | - Khia A Johnson
- Department of Linguistics, University of British Columbia, Canada.
| | - Molly Babel
- Department of Linguistics, University of British Columbia, Canada.
| |
Collapse
|
5
|
Lavan N, Rinke P, Scharinger M. The time course of person perception from voices in the brain. Proc Natl Acad Sci U S A 2024; 121:e2318361121. [PMID: 38889147 PMCID: PMC11214051 DOI: 10.1073/pnas.2318361121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 04/26/2024] [Indexed: 06/20/2024] Open
Abstract
When listeners hear a voice, they rapidly form a complex first impression of who the person behind that voice might be. We characterize how these multivariate first impressions from voices emerge over time across different levels of abstraction using electroencephalography and representational similarity analysis. We find that for eight perceived physical (gender, age, and health), trait (attractiveness, dominance, and trustworthiness), and social characteristics (educatedness and professionalism), representations emerge early (~80 ms after stimulus onset), with voice acoustics contributing to those representations between ~100 ms and 400 ms. While impressions of person characteristics are highly correlated, we can find evidence for highly abstracted, independent representations of individual person characteristics. These abstracted representationse merge gradually over time. That is, representations of physical characteristics (age, gender) arise early (from ~120 ms), while representations of some trait and social characteristics emerge later (~360 ms onward). The findings align with recent theoretical models and shed light on the computations underpinning person perception from voices.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, LondonE1 4NS, United Kingdom
| | - Paula Rinke
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Marburg35037, Germany
| | - Mathias Scharinger
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Marburg35037, Germany
- Research Center “Deutscher Sprachatlas”, Philipps-University Marburg, Marburg35037, Germany
- Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Marburg35032, Germany
| |
Collapse
|
6
|
Ming L, Geng L, Zhao X, Wang Y, Hu N, Yang Y, Hu X. The mechanism of phonetic information in voice identity discrimination: a comparative study based on sighted and blind people. Front Psychol 2024; 15:1352692. [PMID: 38845764 PMCID: PMC11153856 DOI: 10.3389/fpsyg.2024.1352692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 05/10/2024] [Indexed: 06/09/2024] Open
Abstract
Purpose The purpose of this study is to examine whether phonetic information functions and how phonetic information affects voice identity processing in blind people. Method To address the first inquiry, 25 normal sighted participants and 30 blind participants discriminated voice identity, when listening forward speech and backward speech from their own native language and another unfamiliar language. To address the second inquiry, combining articulatory suppression paradigm, 26 normal sighted participants and 26 blind participants discriminated voice identity, when listening forward speech from their own native language and another unfamiliar language. Results In Experiment 1, not only in the voice identity discrimination task with forward speech, but also in the discrimination task with backward speech, both the sighted and blind groups showed the superiority of the native language. This finding supports the view that backward speech still retains some phonetic information, and indicates that phonetic information can affect voice identity processing in sighted and blind people. In addition, only the superiority of the native language of sighted people was regulated by the speech manner, which is related to articulatory rehearsal. In Experiment 2, only the superiority of the native language of sighted people was regulated by articulatory suppression. This indicates that phonetic information may act in different ways on voice identity processing in sighted and blind people. Conclusion The heightened dependence on voice source information in blind people appears not to undermine the function of phonetic information, but it appears to change the functional mechanism of phonetic information. These findings suggest that the present phonetic familiarity model needs to be improved with respect to the mechanism of phonetic information.
Collapse
Affiliation(s)
- Lili Ming
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
- Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou, China
| | - Libo Geng
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
- Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou, China
| | - Xinyu Zhao
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
- Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou, China
| | - Yichan Wang
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
- Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou, China
| | - Na Hu
- School of Preschool and Special Education, Kunming University, Yunnan, China
| | - Yiming Yang
- School of Linguistic Sciences and Arts, Jiangsu Normal University, Xuzhou, China
- Key Laboratory of Language and Cognitive Neuroscience of Jiangsu Province, Collaborative Innovation Center for Language Ability, Xuzhou, China
| | - Xueping Hu
- College of Education, Huaibei Normal University, Huaibei, China
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), Huaibei, China
| |
Collapse
|
7
|
Perepelytsia V, Dellwo V. Acoustic compression in Zoom audio does not compromise voice recognition performance. Sci Rep 2023; 13:18742. [PMID: 37907749 PMCID: PMC10618539 DOI: 10.1038/s41598-023-45971-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/26/2023] [Indexed: 11/02/2023] Open
Abstract
Human voice recognition over telephone channels typically yields lower accuracy when compared to audio recorded in a studio environment with higher quality. Here, we investigated the extent to which audio in video conferencing, subject to various lossy compression mechanisms, affects human voice recognition performance. Voice recognition performance was tested in an old-new recognition task under three audio conditions (telephone, Zoom, studio) across all matched (familiarization and test with same audio condition) and mismatched combinations (familiarization and test with different audio conditions). Participants were familiarized with female voices presented in either studio-quality (N = 22), Zoom-quality (N = 21), or telephone-quality (N = 20) stimuli. Subsequently, all listeners performed an identical voice recognition test containing a balanced stimulus set from all three conditions. Results revealed that voice recognition performance (d') in Zoom audio was not significantly different to studio audio but both in Zoom and studio audio listeners performed significantly better compared to telephone audio. This suggests that signal processing of the speech codec used by Zoom provides equally relevant information in terms of voice recognition compared to studio audio. Interestingly, listeners familiarized with voices via Zoom audio showed a trend towards a better recognition performance in the test (p = 0.056) compared to listeners familiarized with studio audio. We discuss future directions according to which a possible advantage of Zoom audio for voice recognition might be related to some of the speech coding mechanisms used by Zoom.
Collapse
Affiliation(s)
- Valeriia Perepelytsia
- Department of Computational Linguistics, University of Zurich, Andreasstrasse 15, 8050, Zurich, Switzerland.
| | - Volker Dellwo
- Department of Computational Linguistics, University of Zurich, Andreasstrasse 15, 8050, Zurich, Switzerland
| |
Collapse
|
8
|
Basu N, Weber P, Bali AS, Rosas-Aguilar C, Edmond G, Martire KA, Morrison GS. Speaker identification in courtroom contexts - Part II: Investigation of bias in individual listeners' responses. Forensic Sci Int 2023; 349:111768. [PMID: 37392611 DOI: 10.1016/j.forsciint.2023.111768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 05/18/2023] [Accepted: 06/20/2023] [Indexed: 07/03/2023]
Abstract
In "Speaker identification in courtroom contexts - Part I" individual listeners made speaker-identification judgements on pairs of recordings which reflected the conditions of the questioned-speaker and known-speaker recordings in a real case. The recording conditions were poor, and there was a mismatch between the questioned-speaker condition and the known-speaker condition. No contextual information that could potentially bias listeners' responses was included in the experiment condition - it was decontextualized with respect to case circumstances and with respect to other evidence that could be presented in the context of a case. Listeners' responses exhibited a bias in favour of the different-speaker hypothesis. It was hypothesized that the bias was due to the poor and mismatched recording conditions. The present research compares speaker-identification performance between: (1) listeners under the original Part I experiment condition, (2) listeners who were informed ahead of time that the recording conditions would make the recordings sound more different from one another than had they both been high-quality recordings, and (3) listeners who were presented with high-quality versions of the recordings. Under all experiment conditions, there was a substantial bias in favour of the different-speaker hypothesis. The bias in favour of the different-speaker hypothesis therefore appears not to be due to the poor and mismatched recording conditions.
Collapse
Affiliation(s)
- Nabanita Basu
- Forensic Data Science Laboratory, Aston University, Birmingham, UK
| | - Philip Weber
- Forensic Data Science Laboratory, Aston University, Birmingham, UK
| | - Agnes S Bali
- School of Psychology, University of New South Wales, Sydney, New South Wales, Australia
| | - Claudia Rosas-Aguilar
- Instituto de Lingüística y Literatura, Universidad Austral de Chile, Valdivia, Chile
| | - Gary Edmond
- School of Law, Society & Criminology, University of New South Wales, Sydney, New South Wales, Australia
| | - Kristy A Martire
- School of Psychology, University of New South Wales, Sydney, New South Wales, Australia
| | - Geoffrey Stewart Morrison
- Forensic Data Science Laboratory, Aston University, Birmingham, UK; Forensic Evaluation Ltd, Birmingham, UK.
| |
Collapse
|
9
|
Lavan N, McGettigan C. A model for person perception from familiar and unfamiliar voices. COMMUNICATIONS PSYCHOLOGY 2023; 1:1. [PMID: 38665246 PMCID: PMC11041786 DOI: 10.1038/s44271-023-00001-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 04/28/2023] [Indexed: 04/28/2024]
Abstract
When hearing a voice, listeners can form a detailed impression of the person behind the voice. Existing models of voice processing focus primarily on one aspect of person perception - identity recognition from familiar voices - but do not account for the perception of other person characteristics (e.g., sex, age, personality traits). Here, we present a broader perspective, proposing that listeners have a common perceptual goal of perceiving who they are hearing, whether the voice is familiar or unfamiliar. We outline and discuss a model - the Person Perception from Voices (PPV) model - that achieves this goal via a common mechanism of recognising a familiar person, persona, or set of speaker characteristics. Our PPV model aims to provide a more comprehensive account of how listeners perceive the person they are listening to, using an approach that incorporates and builds on aspects of the hierarchical frameworks and prototype-based mechanisms proposed within existing models of voice identity recognition.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Experimental and Biological Psychology, Queen Mary University of London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing, and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
10
|
Clapp W, Vaughn C, Todd S, Sumner M. Talker-specificity and token-specificity in recognition memory. Cognition 2023; 237:105450. [PMID: 37043968 DOI: 10.1016/j.cognition.2023.105450] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 02/02/2023] [Accepted: 03/25/2023] [Indexed: 04/14/2023]
Abstract
Given any feasible amount of time, a talker would never be able to produce the same word twice in an identical manner. Yet recognition memory experiments have consistently used identical tokens to demonstrate that listeners recognize a word more quickly and accurately when it is repeated by the same talker than by a different talker. These talker-specificity effects have served as the foundation of decades of research in speech perception, but the use of identical tokens introduces a confound: Is it the talker or the physical stimulus that drives these effects? And consequently, to what extent do listeners encode the high-level acoustic characteristics of a talker's voice? We investigate the roles of token and talker repetition in two continuous recognition memory experiments. In Exp. 1, listeners heard the voice of one talker, with either Identical or Novel repeated tokens. In Exp. 2, listeners heard two demographically matched talkers, with same-voice repetitions being either Identical or Novel. Classic talker-specificity effects were replicated in both Identical and Novel tokens, but recognition of Identical tokens was in some cases stronger than recognition of Novel tokens. In addition, recognition memory varied across demographically matched talkers, suggesting stronger episodic encoding for one talker than for the other. We argue that novel tokens should serve as the default design for similar studies and that consideration of talker variation can advance our understanding of encoding and memory differences more broadly.
Collapse
Affiliation(s)
- William Clapp
- Department of Linguistics, Stanford University, Margaret Jacks Hall, Bldg. 460, Stanford, CA 94301, United States of America.
| | - Charlotte Vaughn
- Language Science Center, University of Maryland, 2130 H.J. Patterson Hall, College Park, MD 20742, United States of America.
| | - Simon Todd
- Department of Linguistics, University of California Santa Barbara, South Hall 3432, Santa Barbara, CA 93106, United States of America.
| | - Meghan Sumner
- Department of Linguistics, Stanford University, Margaret Jacks Hall, Bldg. 460, Stanford, CA 94301, United States of America.
| |
Collapse
|
11
|
Holmes E, Johnsrude IS. Intelligibility benefit for familiar voices is not accompanied by better discrimination of fundamental frequency or vocal tract length. Hear Res 2023; 429:108704. [PMID: 36701896 DOI: 10.1016/j.heares.2023.108704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 11/11/2022] [Accepted: 01/19/2023] [Indexed: 01/21/2023]
Abstract
Speech is more intelligible when it is spoken by familiar than unfamiliar people. If this benefit arises because key voice characteristics like perceptual correlates of fundamental frequency or vocal tract length (VTL) are more accurately represented for familiar voices, listeners may be able to discriminate smaller manipulations to such characteristics for familiar than unfamiliar voices. We measured participants' (N = 17) thresholds for discriminating pitch (correlate of fundamental frequency, or glottal pulse rate) and formant spacing (correlate of VTL; 'VTL-timbre') for voices that were familiar (participants' friends) and unfamiliar (other participants' friends). As expected, familiar voices were more intelligible. However, discrimination thresholds were no smaller for the same familiar voices. The size of the intelligibility benefit for a familiar over an unfamiliar voice did not relate to the difference in discrimination thresholds for the same voices. Also, the familiar-voice intelligibility benefit was just as large following perceptible manipulations to pitch and VTL-timbre. These results are more consistent with cognitive accounts of speech perception than traditional accounts that predict better discrimination.
Collapse
Affiliation(s)
- Emma Holmes
- Department of Speech Hearing and Phonetic Sciences, UCL, London WC1N 1PF, UK; Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada.
| | - Ingrid S Johnsrude
- Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada; School of Communication Sciences and Disorders, University of Western Ontario, London, Ontario N6G 1H1, Canada
| |
Collapse
|
12
|
Stevenage SV, Singh L, Dixey P. The Curious Case of Impersonators and Singers: Telling Voices Apart and Telling Voices Together under Naturally Challenging Listening Conditions. Brain Sci 2023; 13:358. [PMID: 36831901 PMCID: PMC9954053 DOI: 10.3390/brainsci13020358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/12/2023] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Vocal identity processing depends on the ability to tell apart two instances of different speakers whilst also being able to tell together two instances of the same speaker. Whilst previous research has examined these voice processing capabilities under relatively common listening conditions, it has not yet tested the limits of these capabilities. Here, two studies are presented that employ challenging listening tasks to determine just how good we are at these voice processing tasks. In Experiment 1, 54 university students were asked to distinguish between very similar sounding, yet different speakers (celebrity targets and their impersonators). Participants completed a 'Same/Different' task and a 'Which is the Celebrity?' task to pairs of speakers, and a 'Real or Not?' task to individual speakers. In Experiment 2, a separate group of 40 university students was asked to pair very different sounding instances of the same speakers (speaking and singing). Participants were presented with an array of voice clips and completed a 'Pairs Task' as a variant of the more traditional voice sorting task. The results of Experiment 1 suggested that significantly more mistakes were made when distinguishing celebrity targets from their impersonators than when distinguishing the same targets from control voices. Nevertheless, listeners were significantly better than chance in all three tasks despite the challenge. Similarly, the results of Experiment 2 suggested that it was significantly more difficult to pair singing and speaking clips than to pair two speaking clips, particularly when the speakers were unfamiliar. Again, however, the performance was significantly above zero, and was again better than chance in a cautious comparison. Taken together, the results suggest that vocal identity processing is a highly adaptable task, assisted by familiarity with the speaker. However, the fact that performance remained above chance in all tasks suggests that we had not reached the limit of our listeners' capability, despite the considerable listening challenges introduced. We conclude that voice processing is far better than previous research might have presumed.
Collapse
Affiliation(s)
- Sarah V. Stevenage
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | | | | |
Collapse
|
13
|
Njie S, Lavan N, McGettigan C. Talker and accent familiarity yield advantages for voice identity perception: A voice sorting study. Mem Cognit 2023; 51:175-187. [PMID: 35274221 PMCID: PMC9943951 DOI: 10.3758/s13421-022-01296-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/15/2022] [Indexed: 11/08/2022]
Abstract
In the current study, we examine and compare the effects of talker and accent familiarity in the context of a voice identity sorting task, using naturally varying voice recording samples from the TV show Derry Girls. Voice samples were thus all spoken with a regional accent of UK/Irish English (from [London]derry). We tested four listener groups: Listeners were either familiar or unfamiliar with the TV show (and therefore the talker identities) and were either highly familiar or relatively less familiar with Northern Irish accents. Both talker and accent familiarity significantly improved accuracy of voice identity sorting performance. However, the talker familiarity benefits were overall larger, and more consistent. We discuss the results in light of a possible hierarchy of familiarity effects and argue that our findings may provide additional evidence for interactions of speech and identity processing pathways in voice identity perception. We also identify some key limitations in the current work and provide suggestions for future studies to address these.
Collapse
Affiliation(s)
- Sheriff Njie
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House 2 Wakefield Street, London, WC1N 1PF, UK
| | - Nadine Lavan
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House 2 Wakefield Street, London, WC1N 1PF, UK.
- Department of Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London, E1 4NS, UK.
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House 2 Wakefield Street, London, WC1N 1PF, UK.
| |
Collapse
|
14
|
Kapadia AM, Tin JAA, Perrachione TK. Multiple sources of acoustic variation affect speech processing efficiency. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:209. [PMID: 36732274 PMCID: PMC9836727 DOI: 10.1121/10.0016611] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 11/14/2022] [Accepted: 12/07/2022] [Indexed: 05/29/2023]
Abstract
Phonetic variability across talkers imposes additional processing costs during speech perception, evident in performance decrements when listening to speech from multiple talkers. However, within-talker phonetic variation is a less well-understood source of variability in speech, and it is unknown how processing costs from within-talker variation compare to those from between-talker variation. Here, listeners performed a speeded word identification task in which three dimensions of variability were factorially manipulated: between-talker variability (single vs multiple talkers), within-talker variability (single vs multiple acoustically distinct recordings per word), and word-choice variability (two- vs six-word choices). All three sources of variability led to reduced speech processing efficiency. Between-talker variability affected both word-identification accuracy and response time, but within-talker variability affected only response time. Furthermore, between-talker variability, but not within-talker variability, had a greater impact when the target phonological contrasts were more similar. Together, these results suggest that natural between- and within-talker variability reflect two distinct magnitudes of common acoustic-phonetic variability: Both affect speech processing efficiency, but they appear to have qualitatively and quantitatively unique effects due to differences in their potential to obscure acoustic-phonemic correspondences across utterances.
Collapse
Affiliation(s)
- Alexandra M Kapadia
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jessica A A Tin
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
15
|
Lee JJ, Perrachione TK. Implicit and explicit learning in talker identification. Atten Percept Psychophys 2022; 84:2002-2015. [PMID: 35534783 PMCID: PMC10081569 DOI: 10.3758/s13414-022-02500-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/23/2022] [Indexed: 11/08/2022]
Abstract
In the real world, listeners seem to implicitly learn talkers' vocal identities during interactions that prioritize attending to the content of talkers' speech. In contrast, most laboratory experiments of talker identification employ training paradigms that require listeners to explicitly practice identifying voices. Here, we investigated whether listeners become familiar with talkers' vocal identities during initial exposures that do not involve explicit talker identification. Participants were assigned to one of three exposure tasks, in which they heard identical stimuli but were differentially required to attend to the talkers' vocal identity or to the verbal content of their speech: (1) matching the talker to a concurrent visual cue (talker-matching); (2) discriminating whether the talker was the same as the prior trial (talker 1-back); or (3) discriminating whether speech content matched the previous trial (verbal 1-back). All participants were then tested on their ability to learn to identify talkers from novel speech content. Critically, we manipulated whether the talkers during this post-test differed from those heard during training. Compared to learning to identify novel talkers, listeners were significantly more accurate learning to identify the talkers they had previously been exposed to in the talker-matching and verbal 1-back tasks, but not the talker 1-back task. The correlation between talker identification test performance and exposure task performance was also greater when the talkers were the same in both tasks. These results suggest that listeners learn talkers' vocal identity implicitly during speech perception, even if they are not explicitly attending to the talkers' identity.
Collapse
Affiliation(s)
- Jayden J Lee
- Department of Speech, Language, & Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, & Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA.
| |
Collapse
|
16
|
Van Quaquebeke N, Salem M, van Dijke M, Wenzel R. Conducting organizational survey and experimental research online: From convenient to ambitious in study designs, recruiting, and data quality. ORGANIZATIONAL PSYCHOLOGY REVIEW 2022. [DOI: 10.1177/20413866221097571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Conducting organizational research via online surveys and experiments offers a host of advantages over traditional forms of data collection when it comes to sampling for more advanced study designs, while also ensuring data quality. To draw attention to these advantages and encourage researchers to fully leverage them, the present paper is structured into two parts. First, along a structure of commonly used research designs, we showcase select organizational psychology (OP) and organizational behavior (OB) research and explain how the Internet makes it feasible to conduct research not only with larger and more representative samples, but also with more complex research designs than circumstances usually allow in offline settings. Subsequently, because online data collections often also come with some data quality concerns, in the second section, we synthesize the methodological literature to outline three improvement areas and several accompanying strategies for bolstering data quality. Plain Language Summary: These days, many theories from the fields of organizational psychology and organizational behavior are tested online simply because it is easier. The point of this paper is to illustrate the unique advantages of the Internet beyond mere convenience—specifically, how the related technologies offer more than simply the ability to mirror offline studies. Accordingly, our paper first guides readers through examples of more ambitious online survey and experimental research designs within the organizational domain. Second, we address the potential data quality drawbacks of these approaches by outlining three concrete areas of improvement. Each comes with specific recommendations that can ensure higher data quality when conducting organizational survey or experimental research online.
Collapse
|
17
|
Lee Y, Kreiman J. Acoustic voice variation in spontaneous speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3462. [PMID: 35649890 PMCID: PMC9135459 DOI: 10.1121/10.0011471] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
This study replicates and extends the recent findings of Lee, Keating, and Kreiman [J. Acoust. Soc. Am. 146(3), 1568-1579 (2019)] on acoustic voice variation in read speech, which showed remarkably similar acoustic voice spaces for groups of female and male talkers and the individual talkers within these groups. Principal component analysis was applied to acoustic indices of voice quality measured from phone conversations for 99/100 of the same talkers studied previously. The acoustic voice spaces derived from spontaneous speech are highly similar to those based on read speech, except that unlike read speech, variability in fundamental frequency accounted for significant acoustic variability. Implications of these findings for prototype models of speaker recognition and discrimination are considered.
Collapse
Affiliation(s)
- Yoonjeong Lee
- Department of Head and Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California 90095-1794, USA
| | - Jody Kreiman
- Department of Head and Neck Surgery, David Geffen School of Medicine at UCLA, Los Angeles, California 90095-1794, USA
| |
Collapse
|
18
|
Afshan A, Kreiman J, Alwan A. Speaker discrimination performance for "easy" versus "hard" voices in style-matched and -mismatched speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1393. [PMID: 35232083 PMCID: PMC8888001 DOI: 10.1121/10.0009585] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 01/22/2022] [Accepted: 01/28/2022] [Indexed: 05/19/2023]
Abstract
This study compares human speaker discrimination performance for read speech versus casual conversations and explores differences between unfamiliar voices that are "easy" versus "hard" to "tell together" versus "tell apart." Thirty listeners were asked whether pairs of short style-matched or -mismatched, text-independent utterances represented the same or different speakers. Listeners performed better when stimuli were style-matched, particularly in read speech-read speech trials (equal error rate, EER, of 6.96% versus 15.12% in conversation-conversation trials). In contrast, the EER was 20.68% for the style-mismatched condition. When styles were matched, listeners' confidence was higher when speakers were the same versus different; however, style variation caused decreases in listeners' confidence for the "same speaker" trials, suggesting a higher dependency of this task on within-speaker variability. The speakers who were "easy" or "hard" to "tell together" were not the same as those who were "easy" or "hard" to "tell apart." Analysis of speaker acoustic spaces suggested that the difference observed in human approaches to "same speaker" and "different speaker" tasks depends primarily on listeners' different perceptual strategies when dealing with within- versus between-speaker acoustic variability.
Collapse
Affiliation(s)
- Amber Afshan
- Department of Electrical and Computer Engineering, University of California, Los Angeles, California 90095-1594, USA
| | - Jody Kreiman
- Departments of Head and Neck Surgery and Linguistics, University of California, Los Angeles, California 90095-1794, USA
| | - Abeer Alwan
- Department of Electrical and Computer Engineering, University of California, Los Angeles, California 90095-1594, USA
| |
Collapse
|
19
|
Volfart A, Yan X, Maillard L, Colnat-Coulbois S, Hossu G, Rossion B, Jonas J. Intracerebral electrical stimulation of the right anterior fusiform gyrus impairs human face identity recognition. Neuroimage 2022; 250:118932. [PMID: 35085763 DOI: 10.1016/j.neuroimage.2022.118932] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 01/17/2022] [Accepted: 01/23/2022] [Indexed: 01/23/2023] Open
Abstract
Brain regions located between the right fusiform face area (FFA) in the middle fusiform gyrus and the temporal pole may play a critical role in human face identity recognition but their investigation is limited by a large signal drop-out in functional magnetic resonance imaging (fMRI). Here we report an original case who is suddenly unable to recognize the identity of faces when electrically stimulated on a focal location inside this intermediate region of the right anterior fusiform gyrus. The reliable transient identity recognition deficit occurs without any change of percept, even during nonverbal face tasks (i.e., pointing out the famous face picture among three options; matching pictures of unfamiliar or familiar faces for their identities), and without difficulty at recognizing visual objects or famous written names. The effective contact is associated with the largest frequency-tagged electrophysiological signals of face-selectivity and of familiar and unfamiliar face identity recognition. This extensive multimodal investigation points to the right anterior fusiform gyrus as a critical hub of the human cortical face network, between posterior ventral occipito-temporal face-selective regions directly connected to low-level visual cortex, the medial temporal lobe involved in generic memory encoding, and ventral anterior temporal lobe regions holding semantic associations to people's identity.
Collapse
Affiliation(s)
- Angélique Volfart
- Université de Lorraine, CNRS, CRAN, F-54000 Nancy, France; University of Louvain, Psychological Sciences Research Institute, B-1348 Louvain-La-Neuve, Belgium
| | - Xiaoqian Yan
- Université de Lorraine, CNRS, CRAN, F-54000 Nancy, France; University of Louvain, Psychological Sciences Research Institute, B-1348 Louvain-La-Neuve, Belgium; Stanford University, Department of Psychology, CA 94305 Stanford, USA
| | - Louis Maillard
- Université de Lorraine, CNRS, CRAN, F-54000 Nancy, France; Université de Lorraine, CHRU-Nancy, Service de Neurologie, F-54000 Nancy, France
| | - Sophie Colnat-Coulbois
- Université de Lorraine, CNRS, CRAN, F-54000 Nancy, France; Université de Lorraine, CHRU-Nancy, Service de Neurochirurgie, F-54000 Nancy, France
| | - Gabriela Hossu
- Université de Lorraine, CHRU-Nancy, CIC-IT, F-54000 Nancy, France; Université de Lorraine, Inserm, IADI, F-54000 Nancy, France
| | - Bruno Rossion
- Université de Lorraine, CNRS, CRAN, F-54000 Nancy, France; University of Louvain, Psychological Sciences Research Institute, B-1348 Louvain-La-Neuve, Belgium; Université de Lorraine, CHRU-Nancy, Service de Neurologie, F-54000 Nancy, France
| | - Jacques Jonas
- Université de Lorraine, CNRS, CRAN, F-54000 Nancy, France; Université de Lorraine, CHRU-Nancy, Service de Neurologie, F-54000 Nancy, France.
| |
Collapse
|
20
|
Pinheiro AP, Anikin A, Conde T, Sarzedas J, Chen S, Scott SK, Lima CF. Emotional authenticity modulates affective and social trait inferences from voices. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200402. [PMID: 34719249 PMCID: PMC8558771 DOI: 10.1098/rstb.2020.0402] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2021] [Indexed: 01/31/2023] Open
Abstract
The human voice is a primary tool for verbal and nonverbal communication. Studies on laughter emphasize a distinction between spontaneous laughter, which reflects a genuinely felt emotion, and volitional laughter, associated with more intentional communicative acts. Listeners can reliably differentiate the two. It remains unclear, however, if they can detect authenticity in other vocalizations, and whether authenticity determines the affective and social impressions that we form about others. Here, 137 participants listened to laughs and cries that could be spontaneous or volitional and rated them on authenticity, valence, arousal, trustworthiness and dominance. Bayesian mixed models indicated that listeners detect authenticity similarly well in laughter and crying. Speakers were also perceived to be more trustworthy, and in a higher arousal state, when their laughs and cries were spontaneous. Moreover, spontaneous laughs were evaluated as more positive than volitional ones, and we found that the same acoustic features predicted perceived authenticity and trustworthiness in laughter: high pitch, spectral variability and less voicing. For crying, associations between acoustic features and ratings were less reliable. These findings indicate that emotional authenticity shapes affective and social trait inferences from voices, and that the ability to detect authenticity in vocalizations is not limited to laughter. This article is part of the theme issue 'Voice modulation: from origin and mechanism to social impact (Part I)'.
Collapse
Affiliation(s)
- Ana P. Pinheiro
- CICPSI, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - Andrey Anikin
- Equipe de Neuro-Ethologie Sensorielle (ENES)/Centre de Recherche em Neurosciences de Lyon (CRNL), University of Lyon/Saint-Etienne, CNRS UMR5292, INSERM UMR_S 1028, 42023 Saint-Etienne, France
- Division of Cognitive Science, Lund University, 221 00 Lund, Sweden
| | - Tatiana Conde
- CICPSI, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - João Sarzedas
- CICPSI, Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal
| | - Sinead Chen
- National Taiwan University, Taipei City, 10617 Taiwan
| | - Sophie K. Scott
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
| | - César F. Lima
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, UK
- Instituto Universitário de Lisboa (ISCTE-IUL), Avenida das Forças Armadas, 1649-026 Lisboa, Portugal
| |
Collapse
|
21
|
Yamamoto HW, Kawahara M, Tanaka A. A Web-Based Auditory and Visual Emotion Perception Task Experiment With Children and a Comparison of Lab Data and Web Data. Front Psychol 2021; 12:702106. [PMID: 34484051 PMCID: PMC8416272 DOI: 10.3389/fpsyg.2021.702106] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/19/2021] [Indexed: 12/03/2022] Open
Abstract
Due to the COVID-19 pandemic, the significance of online research has been rising in the field of psychology. However, online experiments with child participants are rare compared to those with adults. In this study, we investigated the validity of web-based experiments with child participants 4–12 years old and adult participants. They performed simple emotional perception tasks in an experiment designed and conducted on the Gorilla Experiment Builder platform. After short communication with each participant via Zoom videoconferencing software, participants performed the auditory task (judging emotion from vocal expression) and the visual task (judging emotion from facial expression). The data collected were compared with data collected in our previous similar laboratory experiment, and similar tendencies were found. For the auditory task in particular, we replicated differences in accuracy perceiving vocal expressions between age groups and also found the same native language advantage. Furthermore, we discuss the possibility of using online cognitive studies for future developmental studies.
Collapse
Affiliation(s)
- Hisako W Yamamoto
- Tokyo Woman's Christian University, Tokyo, Japan.,Japan Society for the Promotion of Science, Tokyo, Japan
| | - Misako Kawahara
- Tokyo Woman's Christian University, Tokyo, Japan.,Japan Society for the Promotion of Science, Tokyo, Japan
| | | |
Collapse
|
22
|
Lavan N, Collins MRN, Miah JFM. Audiovisual identity perception from naturally-varying stimuli is driven by visual information. Br J Psychol 2021; 113:248-263. [PMID: 34490897 DOI: 10.1111/bjop.12531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/19/2021] [Indexed: 11/30/2022]
Abstract
Identity perception often takes place in multimodal settings, where perceivers have access to both visual (face) and auditory (voice) information. Despite this, identity perception is usually studied in unimodal contexts, where face and voice identity perception are modelled independently from one another. In this study, we asked whether and how much auditory and visual information contribute to audiovisual identity perception from naturally-varying stimuli. In a between-subjects design, participants completed an identity sorting task with either dynamic video-only, audio-only or dynamic audiovisual stimuli. In this task, participants were asked to sort multiple, naturally-varying stimuli from three different people by perceived identity. We found that identity perception was more accurate for video-only and audiovisual stimuli compared with audio-only stimuli. Interestingly, there was no difference in accuracy between video-only and audiovisual stimuli. Auditory information nonetheless played a role alongside visual information as audiovisual identity judgements per stimulus could be predicted from both auditory and visual identity judgements, respectively. While the relationship was stronger for visual information and audiovisual information, auditory information still uniquely explained a significant portion of the variance in audiovisual identity judgements. Our findings thus align with previous theoretical and empirical work that proposes that, compared with faces, voices are an important but relatively less salient and a weaker cue to identity perception. We expand on this work to show that, at least in the context of this study, having access to voices in addition to faces does not result in better identity perception accuracy.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| | - Madeleine Rose Niamh Collins
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| | - Jannatul Firdaus Monisha Miah
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| |
Collapse
|
23
|
Lavan N. The effect of familiarity on within-person age judgements from voices. Br J Psychol 2021; 113:287-299. [PMID: 34415575 DOI: 10.1111/bjop.12526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 06/22/2021] [Indexed: 11/28/2022]
Abstract
Listeners can perceive a person's age from their voice with above chance accuracy. Studies have usually established this by asking listeners to directly estimate the age of unfamiliar voices. The recordings used mostly include cross-sectional samples of voices, including people of different ages to cover the age range of interest. Such cross-sectional samples likely include not only cues to age in the sound of the voice but also socio-phonetic cues, encoded in how a person speaks. How age perpcetion accuracy is affected when minimizing socio-phonetic cues by sampling the same voice at different time points remains largely unknown. Similarly, with the voices in age perception studies being usually unfamiliar to listeners, it is unclear how familiarity with a voice affects age perception. We asked listeners who were either familiar or unfamiliar with a set of four voices to complete an age discrimination task: listeners heard two recordings of the same person's voice, recorded 15 years apart, and were asked to indicate in which recording the person was younger. Accuracy for both familiar and unfamiliar listeners was above chance. While familiarity advantages were apparent, accuracy was not particularly high: familiar and unfamiliar listeners were correct for 68.2% and 62.7% of trials, respectively (chance = 50%). Familiarity furthermore interacted with the voices included. Overall, our findings indicate that age perception from voices is not a trivial task at all times - even when listeners are familiar with a voice. We discuss our findings in the light of how reliable voice may be as a signal for age.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Experimental and Biological Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK.,Department of Speech, Hearing and Phonetic Sciences, University College London, UK
| |
Collapse
|
24
|
Familiarity and task context shape the use of acoustic information in voice identity perception. Cognition 2021; 215:104780. [PMID: 34298232 PMCID: PMC8381763 DOI: 10.1016/j.cognition.2021.104780] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 05/10/2021] [Accepted: 05/12/2021] [Indexed: 11/23/2022]
Abstract
Familiar and unfamiliar voice perception are often understood as being distinct from each other. For identity perception, theoretical work has proposed that listeners use acoustic information in different ways to perceive identity from familiar and unfamiliar voices: Unfamiliar voices are thought to be processed based on close comparisons of acoustic properties, while familiar voices are processed based on diagnostic acoustic features that activate a stored person-specific representation of that voice. To date no empirical study has directly examined whether and how familiar and unfamiliar listeners differ in their use of acoustic information for identity perception. Here, we tested this theoretical claim by linking listeners' judgements in voice identity tasks to complex acoustic representation - spectral similarity of the heard voice recordings. Participants (N = 177) who were either familiar or unfamiliar with a set of voices completed an identity discrimination task (Experiment 1) or an identity sorting task (Experiment 2). In both experiments, identity judgements for familiar and unfamiliar voices were guided by spectral similarity: Pairs of recordings with greater acoustic similarity were more likely to be perceived as belonging to the same voice identity. However, while there were no differences in how familiar and unfamiliar listeners used acoustic information for identity discrimination, differences were apparent for identity sorting. Our study therefore challenges proposals that view familiar and unfamiliar voice perception as being at all times distinct. Instead, our data suggest a critical role of the listening situation in which familiar and unfamiliar voices are evaluated, thus characterising voice identity perception as a highly dynamic process in which listeners opportunistically make use of any kind of information they can access.
Collapse
|
25
|
Unimodal and cross-modal identity judgements using an audio-visual sorting task: Evidence for independent processing of faces and voices. Mem Cognit 2021; 50:216-231. [PMID: 34254274 PMCID: PMC8763756 DOI: 10.3758/s13421-021-01198-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2021] [Indexed: 11/18/2022]
Abstract
Unimodal and cross-modal information provided by faces and voices contribute to identity percepts. To examine how these sources of information interact, we devised a novel audio-visual sorting task in which participants were required to group video-only and audio-only clips into two identities. In a series of three experiments, we show that unimodal face and voice sorting were more accurate than cross-modal sorting: While face sorting was consistently most accurate followed by voice sorting, cross-modal sorting was at chancel level or below. In Experiment 1, we compared performance in our novel audio-visual sorting task to a traditional identity matching task, showing that unimodal and cross-modal identity perception were overall moderately more accurate than the traditional identity matching task. In Experiment 2, separating unimodal from cross-modal sorting led to small improvements in accuracy for unimodal sorting, but no change in cross-modal sorting performance. In Experiment 3, we explored the effect of minimal audio-visual training: Participants were shown a clip of the two identities in conversation prior to completing the sorting task. This led to small, nonsignificant improvements in accuracy for unimodal and cross-modal sorting. Our results indicate that unfamiliar face and voice perception operate relatively independently with no evidence of mutual benefit, suggesting that extracting reliable cross-modal identity information is challenging.
Collapse
|
26
|
Jenkins RE, Tsermentseli S, Monks CP, Robertson DJ, Stevenage SV, Symons AE, Davis JP. Are super‐face‐recognisers also super‐voice‐recognisers? Evidence from cross‐modal identification tasks. APPLIED COGNITIVE PSYCHOLOGY 2021. [DOI: 10.1002/acp.3813] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Ryan E. Jenkins
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - Stella Tsermentseli
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - Claire P. Monks
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - David J. Robertson
- School of Psychological Sciences and Health University of Strathclyde Glasgow UK
| | | | - Ashley E. Symons
- Department of Psychology University of Southampton Southampton UK
| | - Josh P. Davis
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| |
Collapse
|
27
|
Bestelmeyer PEG, Mühl C. Individual differences in voice adaptability are specifically linked to voice perception skill. Cognition 2021; 210:104582. [PMID: 33450447 DOI: 10.1016/j.cognition.2021.104582] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 01/01/2021] [Accepted: 01/03/2021] [Indexed: 10/22/2022]
Abstract
There are remarkable individual differences in the ability to recognise individuals by the sound of their voice. Theoretically, this ability is thought to depend on the coding accuracy of voices in a low-dimensional "voice-space". Here we were interested in how adaptive coding of voice identity relates to this variability in skill. In two adaptation experiments we explored first whether the aftereffect size to two familiar vocal identities can predict voice perception ability and second, whether this effect stems from general auditory skill (e.g. discrimination ability for tuning and tempo). Experiment 1 demonstrated that contrastive aftereffect sizes for voice identity predicted voice perception ability. In Experiment 2, we replicated this finding and further established that this effect is unrelated to general auditory abilities or general adaptability of listeners. Our results highlight the important functional role of adaptive coding in voice expertise and suggest that human voice perception is a highly specialised and distinct auditory ability.
Collapse
Affiliation(s)
| | - Constanze Mühl
- School of Psychology, Bangor University, Bangor, Gwynedd, UK
| |
Collapse
|
28
|
Kreiman J, Lee Y, Garellek M, Samlan R, Gerratt BR. Validating a psychoacoustic model of voice quality. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:457. [PMID: 33514179 PMCID: PMC7822631 DOI: 10.1121/10.0003331] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 12/07/2020] [Accepted: 12/16/2020] [Indexed: 05/19/2023]
Abstract
No agreed-upon method currently exists for objective measurement of perceived voice quality. This paper describes validation of a psychoacoustic model designed to fill this gap. This model includes parameters to characterize the harmonic and inharmonic voice sources, vocal tract transfer function, fundamental frequency, and amplitude of the voice, which together serve to completely quantify the integral sound of a target voice sample. In experiment 1, 200 voices with and without diagnosed vocal pathology were fit with the model using analysis-by-synthesis. The resulting synthetic voice samples were not distinguishable from the original voice tokens, suggesting that the model has all the parameters it needs to fully quantify voice quality. In experiment 2 parameters that model the harmonic voice source were removed one by one, and the voice tokens were re-synthesized with the reduced model. In every case the lower-dimensional models provided worse perceptual matches to the quality of the natural tokens than did the original set, indicating that the psychoacoustic model cannot be reduced in dimensionality without loss of fit to the data. Results confirm that this model can be validly applied to quantify voice quality in clinical and research applications.
Collapse
Affiliation(s)
- Jody Kreiman
- Departments of Head and Neck Surgery and Linguistics, University of California-Los Angeles, Los Angeles, California 90095-1794, USA
| | - Yoonjeong Lee
- Departments of Head and Neck Surgery and Linguistics, University of California-Los Angeles, Los Angeles, California 90095-1794, USA
| | - Marc Garellek
- Department of Linguistics, University of California-San Diego, San Diego, California 92093-0108, USA
| | - Robin Samlan
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Bruce R Gerratt
- Department of Head and Neck Surgery, University of California-Los Angeles School of Medicine, Los Angeles, California 90095-1794, USA
| |
Collapse
|
29
|
May I Speak Freely? The Difficulty in Vocal Identity Processing Across Free and Scripted Speech. JOURNAL OF NONVERBAL BEHAVIOR 2020. [DOI: 10.1007/s10919-020-00348-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractIn the fields of face recognition and voice recognition, a growing literature now suggests that the ability to recognize an individual despite changes from one instance to the next is a considerable challenge. The present paper reports on one experiment in the voice domain designed to determine whether a change in the mere style of speech may result in a measurable difficulty when trying to discriminate between speakers. Participants completed a speaker discrimination task to pairs of speech clips, which represented either free speech or scripted speech segments. The results suggested that speaker discrimination was significantly better when the style of speech did not change compared to when it did change, and was significantly better from scripted than from free speech segments. These results support the emergent body of evidence suggesting that within-identity variability is a challenge, and the forensic implications of such a mild change in speech style are discussed.
Collapse
|
30
|
Individual differences in face and voice matching abilities: The relationship between accuracy and consistency. APPLIED COGNITIVE PSYCHOLOGY 2020. [DOI: 10.1002/acp.3754] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
31
|
Johnson J, McGettigan C, Lavan N. Comparing unfamiliar voice and face identity perception using identity sorting tasks. Q J Exp Psychol (Hove) 2020; 73:1537-1545. [PMID: 32530364 PMCID: PMC7534197 DOI: 10.1177/1747021820938659] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 02/11/2020] [Accepted: 03/03/2020] [Indexed: 11/16/2022]
Abstract
Identity sorting tasks, in which participants sort multiple naturally varying stimuli of usually two identities into perceived identities, have recently gained popularity in voice and face processing research. In both modalities, participants who are unfamiliar with the identities tend to perceive multiple stimuli of the same identity as different people and thus fail to "tell people together." These similarities across modalities suggest that modality-general mechanisms may underpin sorting behaviour. In this study, participants completed a voice sorting and a face sorting task. Taking an individual differences approach, we asked whether participants' performance on voice and face sorting of unfamiliar identities is correlated. Participants additionally completed a voice discrimination (Bangor Voice Matching Test) and a face discrimination task (Glasgow Face Matching Test). Using these tasks, we tested whether performance on sorting related to explicit identity discrimination. Performance on voice sorting and face sorting tasks was correlated, suggesting that common modality-general processes underpin these tasks. However, no significant correlations were found between sorting and discrimination performance, with the exception of significant relationships for performance on "same identity" trials with "telling people together" for voices and faces. Overall, any reported relationships were however relatively weak, suggesting the presence of additional modality-specific and task-specific processes.
Collapse
Affiliation(s)
- Justine Johnson
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Nadine Lavan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
32
|
Heeren WFL. The effect of word class on speaker-dependent information in the Standard Dutch vowel /aː/. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2028. [PMID: 33138546 DOI: 10.1121/10.0002173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 09/24/2020] [Indexed: 06/11/2023]
Abstract
Linguistic structure co-determines how a speech sound is produced. This study therefore investigated whether the speaker-dependent information in the vowel [aː] varies when uttered in different word classes. From two spontaneous speech corpora, [aː] tokens were sampled and annotated for word class (content, function word). This was done for 50 male adult speakers of Standard Dutch in face-to-face speech (N = 3128 tokens), and another 50 male adult speakers in telephone speech (N = 3136 tokens). First, the effect of word class on various acoustic variables in spontaneous speech was tested. Results showed that [aː]'s were shorter and more centralized in function than content words. Next, tokens were used to assess their speaker-dependent information as a function of word class, by using acoustic-phonetic variables to (a) build speaker classification models and (b) compute the strength-of-evidence, a technique from forensic phonetics. Speaker-classification performance was somewhat better for content than function words, whereas forensic strength-of-evidence was comparable between the word classes. This seems explained by how these methods weigh between- and within-speaker variation. Because these two sources of variation co-varied in size with word class, acoustic word-class variation is not expected to affect the sampling of tokens in forensic speaker comparisons.
Collapse
Affiliation(s)
- Willemijn F L Heeren
- Leiden University Centre for Linguistics, Leiden University, Reuvensplaats 3-4, 2311 BE Leiden, the Netherlands
| |
Collapse
|
33
|
Lavan N, Mileva M, McGettigan C. How does familiarity with a voice affect trait judgements? Br J Psychol 2020; 112:282-300. [PMID: 32445499 DOI: 10.1111/bjop.12454] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 05/05/2020] [Indexed: 11/27/2022]
Abstract
From only a single spoken word, listeners can form a wealth of first impressions of a person's character traits and personality based on their voice. However, due to the substantial within-person variability in voices, these trait judgements are likely to be highly stimulus-dependent for unfamiliar voices: The same person may sound very trustworthy in one recording but less trustworthy in another. How trait judgements differ when listeners are familiar with a voice is unclear: Are listeners who are familiar with the voices as susceptible to the effects of within-person variability? Does the semantic knowledge listeners have about a familiar person influence their judgements? In the current study, we tested the effect of familiarity on listeners' trait judgements from variable voices across 3 experiments. Using a between-subjects design, we contrasted trait judgements by listeners who were familiar with a set of voices - either through laboratory-based training or through watching a TV show - with listeners who were unfamiliar with the voices. We predicted that familiarity with the voices would reduce variability in trait judgements for variable voice recordings from the same identity (cf. Mileva, Kramer & Burton, Perception, 48, 471 and 2019, for faces). However, across the 3 studies and two types of measures to assess variability, we found no compelling evidence to suggest that trait impressions were systematically affected by familiarity.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Speech, Hearing and Phonetic Sciences, University College London, UK
| | - Mila Mileva
- Department of Psychology, University of York, UK.,School of Psychology, University of Plymouth, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, UK
| |
Collapse
|
34
|
Stevenage SV, Symons AE, Fletcher A, Coen C. Sorting through the impact of familiarity when processing vocal identity: Results from a voice sorting task. Q J Exp Psychol (Hove) 2019; 73:519-536. [PMID: 31658884 PMCID: PMC7074657 DOI: 10.1177/1747021819888064] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The present article reports on one experiment designed to examine the importance of familiarity when processing vocal identity. A voice sorting task was used with participants who were either personally familiar or unfamiliar with three speakers. The results suggested that familiarity supported both an ability to tell different instances of the same voice together, and to tell similar instances of different voices apart. In addition, the results suggested differences between the three speakers in terms of the extent to which they were confusable, underlining the importance of vocal characteristics and stimulus selection within behavioural tasks. The results are discussed with reference to existing debates regarding the nature of stored representations as familiarity develops, and the difficulty when processing voices over faces more generally.
Collapse
Affiliation(s)
| | - Ashley E Symons
- School of Psychology, University of Southampton, Southampton, UK
| | - Abi Fletcher
- School of Psychology, University of Southampton, Southampton, UK
| | - Chantelle Coen
- School of Psychology, University of Southampton, Southampton, UK
| |
Collapse
|
35
|
Perrachione TK, Furbeck KT, Thurston EJ. Acoustic and linguistic factors affecting perceptual dissimilarity judgments of voices. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3384. [PMID: 31795676 PMCID: PMC7043842 DOI: 10.1121/1.5126697] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2018] [Revised: 08/18/2019] [Accepted: 09/03/2019] [Indexed: 05/20/2023]
Abstract
The human voice is a complex acoustic signal that conveys talker identity via individual differences in numerous features, including vocal source acoustics, vocal tract resonances, and dynamic articulations during speech. It remains poorly understood how differences in these features contribute to perceptual dissimilarity of voices and, moreover, whether linguistic differences between listeners and talkers interact during perceptual judgments of voices. Here, native English- and Mandarin-speaking listeners rated the perceptual dissimilarity of voices speaking English or Mandarin from either forward or time-reversed speech. The language spoken by talkers, but not listeners, principally influenced perceptual judgments of voices. Perceptual dissimilarity judgments of voices were always highly correlated between listener groups and forward/time-reversed speech. Representational similarity analyses that explored how acoustic features (fundamental frequency mean and variation, jitter, harmonics-to-noise ratio, speech rate, and formant dispersion) contributed to listeners' perceptual dissimilarity judgments, including how talker- and listener-language affected these relationships, found the largest effects relating to voice pitch. Overall, these data suggest that, while linguistic factors may influence perceptual judgments of voices, the magnitude of such effects tends to be very small. Perceptual judgments of voices by listeners of different native language backgrounds tend to be more alike than different.
Collapse
Affiliation(s)
- Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Kristina T Furbeck
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Emily J Thurston
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
36
|
Working-memory disruption by task-irrelevant talkers depends on degree of talker familiarity. Atten Percept Psychophys 2019; 81:1108-1118. [PMID: 30993655 DOI: 10.3758/s13414-019-01727-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When one is listening, familiarity with an attended talker's voice improves speech comprehension. Here, we instead investigated the effect of familiarity with a distracting talker. In an irrelevant-speech task, we assessed listeners' working memory for the serial order of spoken digits when a task-irrelevant, distracting sentence was produced by either a familiar or an unfamiliar talker (with rare omissions of the task-irrelevant sentence). We tested two groups of listeners using the same experimental procedure. The first group were undergraduate psychology students (N = 66) who had attended an introductory statistics course. Critically, each student had been taught by one of two course instructors, whose voices served as the familiar and unfamiliar task-irrelevant talkers. The second group of listeners were family members and friends (N = 20) who had known either one of the two talkers for more than 10 years. Students, but not family members and friends, made more errors when the task-irrelevant talker was familiar versus unfamiliar. Interestingly, the effect of talker familiarity was not modulated by the presence of task-irrelevant speech: Students experienced stronger working memory disruption by a familiar talker, irrespective of whether they heard a task-irrelevant sentence during memory retention or merely expected it. While previous work has shown that familiarity with an attended talker benefits speech comprehension, our findings indicate that familiarity with an ignored talker disrupts working memory for target speech. The absence of this effect in family members and friends suggests that the degree of familiarity modulates the memory disruption.
Collapse
|
37
|
Lee Y, Keating P, Kreiman J. Acoustic voice variation within and between speakers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1568. [PMID: 31590565 PMCID: PMC6909978 DOI: 10.1121/1.5125134] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Revised: 07/18/2019] [Accepted: 08/19/2019] [Indexed: 05/19/2023]
Abstract
Little is known about the nature or extent of everyday variability in voice quality. This paper describes a series of principal component analyses to explore within- and between-talker acoustic variation and the extent to which they conform to expectations derived from current models of voice perception. Based on studies of faces and cognitive models of speaker recognition, the authors hypothesized that a few measures would be important across speakers, but that much of within-speaker variability would be idiosyncratic. Analyses used multiple sentence productions from 50 female and 50 male speakers of English, recorded over three days. Twenty-six acoustic variables from a psychoacoustic model of voice quality were measured every 5 ms on vowels and approximants. Across speakers the balance between higher harmonic amplitudes and inharmonic energy in the voice accounted for the most variance (females = 20%, males = 22%). Formant frequencies and their variability accounted for an additional 12% of variance across speakers. Remaining variance appeared largely idiosyncratic, suggesting that the speaker-specific voice space is different for different people. Results further showed that voice spaces for individuals and for the population of talkers have very similar acoustic structures. Implications for prototype models of voice perception and recognition are discussed.
Collapse
Affiliation(s)
- Yoonjeong Lee
- Department of Head and Neck Surgery, UCLA School of Medicine, 1000 Veteran Avenue, Los Angeles, California 90095-1794, USA
| | - Patricia Keating
- Department of Linguistics, University of California, Los Angeles, 3125 Campbell Hall, Box 951543, Los Angeles, California 90095-1543, USA
| | - Jody Kreiman
- Department of Head and Neck Surgery, UCLA School of Medicine, 1000 Veteran Avenue, Los Angeles, California 90095-1794, USA
| |
Collapse
|
38
|
Lavan N, Knight S, McGettigan C. Listeners form average-based representations of individual voice identities. Nat Commun 2019; 10:2404. [PMID: 31160558 PMCID: PMC6546765 DOI: 10.1038/s41467-019-10295-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Accepted: 05/03/2019] [Indexed: 11/17/2022] Open
Abstract
Models of voice perception propose that identities are encoded relative to an abstracted average or prototype. While there is some evidence for norm-based coding when learning to discriminate different voices, little is known about how the representation of an individual's voice identity is formed through variable exposure to that voice. In two experiments, we show evidence that participants form abstracted representations of individual voice identities based on averages, despite having never been exposed to these averages during learning. We created 3 perceptually distinct voice identities, fully controlling their within-person variability. Listeners first learned to recognise these identities based on ring-shaped distributions located around the perimeter of within-person voice spaces - crucially, these distributions were missing their centres. At test, listeners' accuracy for old/new judgements was higher for stimuli located on an untrained distribution nested around the centre of each ring-shaped distribution compared to stimuli on the trained ring-shaped distribution.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, WC1N 1PF, UK.
- Department of Psychology, Royal Holloway, University of London, Egham, TW20 0EX, UK.
| | - Sarah Knight
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, WC1N 1PF, UK
- Department of Psychology, Royal Holloway, University of London, Egham, TW20 0EX, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, WC1N 1PF, UK.
- Department of Psychology, Royal Holloway, University of London, Egham, TW20 0EX, UK.
| |
Collapse
|
39
|
Abstract
Human voices are extremely variable: The same person can sound very different depending on whether they are speaking, laughing, shouting or whispering. In order to successfully recognise someone from their voice, a listener needs to be able to generalize across these different vocal signals (‘telling people together’). However, in most studies of voice-identity processing to date, the substantial within-person variability has been eliminated through the use of highly controlled stimuli, thus focussing on how we tell people apart. We argue that this obscures our understanding of voice-identity processing by controlling away an essential feature of vocal stimuli that may include diagnostic information. In this paper, we propose that we need to extend the focus of voice-identity research to account for both “telling people together” as well as “telling people apart.” That is, we must account for whether, and to what extent, listeners can overcome within-person variability to obtain a stable percept of person identity from vocal cues. To do this, our theoretical and methodological frameworks need to be adjusted to explicitly include the study of within-person variability.
Collapse
|
40
|
Lavan N, Domone A, Fisher B, Kenigzstein N, Scott SK, McGettigan C. Speaker Sex Perception from Spontaneous and Volitional Nonverbal Vocalizations. JOURNAL OF NONVERBAL BEHAVIOR 2018; 43:1-22. [PMID: 31148883 PMCID: PMC6514200 DOI: 10.1007/s10919-018-0289-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In two experiments, we explore how speaker sex recognition is affected by vocal flexibility, introduced by volitional and spontaneous vocalizations. In Experiment 1, participants judged speaker sex from two spontaneous vocalizations, laughter and crying, and volitionally produced vowels. Striking effects of speaker sex emerged: For male vocalizations, listeners' performance was significantly impaired for spontaneous vocalizations (laughter and crying) compared to a volitional baseline (repeated vowels), a pattern that was also reflected in longer reaction times for spontaneous vocalizations. Further, performance was less accurate for laughter than crying. For female vocalizations, a different pattern emerged. In Experiment 2, we largely replicated the findings of Experiment 1 using spontaneous laughter, volitional laughter and (volitional) vowels: here, performance for male vocalizations was impaired for spontaneous laughter compared to both volitional laughter and vowels, providing further evidence that differences in volitional control over vocal production may modulate our ability to accurately perceive speaker sex from vocal signals. For both experiments, acoustic analyses showed relationships between stimulus fundamental frequency (F0) and the participants' responses. The higher the F0 of a vocal signal, the more likely listeners were to perceive a vocalization as being produced by a female speaker, an effect that was more pronounced for vocalizations produced by males. We discuss the results in terms of the availability of salient acoustic cues across different vocalizations.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX UK
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Abigail Domone
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX UK
| | - Betty Fisher
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX UK
| | - Noa Kenigzstein
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX UK
| | | | - Carolyn McGettigan
- Department of Psychology, Royal Holloway, University of London, Egham Hill, Egham, TW20 0EX UK
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|