1
|
Gainotti G. Human Recognition: The Utilization of Face, Voice, Name and Interactions-An Extended Editorial. Brain Sci 2024; 14:345. [PMID: 38671996 PMCID: PMC11048321 DOI: 10.3390/brainsci14040345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 03/21/2024] [Indexed: 04/28/2024] Open
Abstract
The many stimulating contributions to this Special Issue of Brain Science focused on some basic issues of particular interest in current research, with emphasis on human recognition using faces, voices, and names [...].
Collapse
Affiliation(s)
- Guido Gainotti
- Institute of Neurology, Università Cattolica del Sacro Cuore, Fondazione Policlinico A. Gemelli, Istituto di Ricovero e Cura a Carattere Scientifico, 00168 Rome, Italy
| |
Collapse
|
2
|
Tompkinson J, Mileva M, Watt D, Mike Burton A. Perception of threat and intent to harm from vocal and facial cues. Q J Exp Psychol (Hove) 2024; 77:326-342. [PMID: 37020335 PMCID: PMC10798027 DOI: 10.1177/17470218231169952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 03/16/2023] [Accepted: 03/22/2023] [Indexed: 04/07/2023]
Abstract
What constitutes a "threatening tone of voice"? There is currently little research exploring how listeners infer threat, or the intention to cause harm, from speakers' voices. Here, we investigated the influence of key linguistic variables on these evaluations (Study 1). Results showed a trend for voices perceived to be lower in pitch, particularly those of male speakers, to be evaluated as sounding more threatening and conveying greater intent to harm. We next investigated the evaluation of multimodal stimuli comprising voices and faces varying in perceived dominance (Study 2). Visual information about the speaker's face had a significant effect on threat and intent ratings. In both experiments, we observed a relatively low level of agreement among individual listeners' evaluations, emphasising idiosyncrasy in the ways in which threat and intent-to-harm are perceived. This research provides a basis for the perceptual experience of a "threatening tone of voice," along with an exploration of vocal and facial cue integration in social evaluation.
Collapse
Affiliation(s)
- James Tompkinson
- Aston Institute for Forensic Linguistics, College of Business and Social Sciences, Aston University, Birmingham, UK
| | - Mila Mileva
- School of Psychology, University of Plymouth, Plymouth, UK
| | - Dominic Watt
- Department of Language and Linguistic Science, University of York, York, UK
| | - A Mike Burton
- Department of Psychology, University of York, York, UK
| |
Collapse
|
3
|
Zadoorian S, Rosenblum LD. The Benefit of Bimodal Training in Voice Learning. Brain Sci 2023; 13:1260. [PMID: 37759861 PMCID: PMC10526927 DOI: 10.3390/brainsci13091260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/25/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023] Open
Abstract
It is known that talkers can be recognized by listening to their specific vocal qualities-breathiness and fundamental frequencies. However, talker identification can also occur by focusing on the talkers' unique articulatory style, which is known to be available auditorily and visually and can be shared across modalities. Evidence shows that voices heard while seeing talkers' faces are later recognized better on their own compared to the voices heard alone. The present study investigated whether the facilitation of voice learning through facial cues relies on talker-specific articulatory or nonarticulatory facial information. Participants were initially trained to learn the voices of ten talkers presented either on their own or together with (a) an articulating face, (b) a static face, or (c) an isolated articulating mouth. Participants were then tested on recognizing the voices on their own regardless of their training modality. Consistent with previous research, voices learned with articulating faces were recognized better on their own compared to voices learned alone. However, isolated articulating mouths did not provide an advantage in learning the voices. The results demonstrated that learning voices while seeing faces resulted in better voice learning compared to the voices learned alone.
Collapse
|
4
|
Zäske R, Kaufmann JM, Schweinberger SR. Neural Correlates of Voice Learning with Distinctive and Non-Distinctive Faces. Brain Sci 2023; 13:637. [PMID: 37190602 PMCID: PMC10136676 DOI: 10.3390/brainsci13040637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/31/2023] [Accepted: 04/04/2023] [Indexed: 05/17/2023] Open
Abstract
Recognizing people from their voices may be facilitated by a voice's distinctiveness, in a manner similar to that which has been reported for faces. However, little is known about the neural time-course of voice learning and the role of facial information in voice learning. Based on evidence for audiovisual integration in the recognition of familiar people, we studied the behavioral and electrophysiological correlates of voice learning associated with distinctive or non-distinctive faces. We repeated twelve unfamiliar voices uttering short sentences, together with either distinctive or non-distinctive faces (depicted before and during voice presentation) in six learning-test cycles. During learning, distinctive faces increased early visually-evoked (N170, P200, N250) potentials relative to non-distinctive faces, and face distinctiveness modulated voice-elicited slow EEG activity at the occipito-temporal and fronto-central electrodes. At the test, unimodally-presented voices previously learned with distinctive faces were classified more quickly than were voices learned with non-distinctive faces, and also more quickly than novel voices. Moreover, voices previously learned with faces elicited an N250-like component that was similar in topography to that typically observed for facial stimuli. The preliminary source localization of this voice-induced N250 was compatible with a source in the fusiform gyrus. Taken together, our findings provide support for a theory of early interaction between voice and face processing areas during both learning and voice recognition.
Collapse
Affiliation(s)
- Romi Zäske
- Department of Experimental Otorhinolaryngology, Jena University Hospital, Stoystraße 3, 07743 Jena, Germany
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University of Jena, Am Steiger 3/1, 07743 Jena, Germany
- Voice Research Unit, Friedrich Schiller University of Jena, Leutragraben 1, 07743 Jena, Germany
| | - Jürgen M. Kaufmann
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University of Jena, Am Steiger 3/1, 07743 Jena, Germany
| | - Stefan R. Schweinberger
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University of Jena, Am Steiger 3/1, 07743 Jena, Germany
- Voice Research Unit, Friedrich Schiller University of Jena, Leutragraben 1, 07743 Jena, Germany
| |
Collapse
|
5
|
Karlsson T, Schaefer H, Barton JJS, Corrow SL. Effects of Voice and Biographic Data on Face Encoding. Brain Sci 2023; 13:brainsci13010148. [PMID: 36672128 PMCID: PMC9857090 DOI: 10.3390/brainsci13010148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 01/05/2023] [Accepted: 01/10/2023] [Indexed: 01/18/2023] Open
Abstract
There are various perceptual and informational cues for recognizing people. How these interact in the recognition process is of interest. Our goal was to determine if the encoding of faces was enhanced by the concurrent presence of a voice, biographic data, or both. Using a between-subject design, four groups of 10 subjects learned the identities of 24 faces seen in video-clips. Half of the faces were seen only with their names, while the other half had additional information. For the first group this was the person's voice, for the second, it was biographic data, and for the third, both voice and biographic data. In a fourth control group, the additional information was the voice of a generic narrator relating non-biographic information. In the retrieval phase, subjects performed a familiarity task and then a face-to-name identification task with dynamic faces alone. Our results consistently showed no benefit to face encoding with additional information, for either the familiarity or identification task. Tests for equivalency indicated that facilitative effects of a voice or biographic data on face encoding were not likely to exceed 3% in accuracy. We conclude that face encoding is minimally influenced by cross-modal information from voices or biographic data.
Collapse
Affiliation(s)
- Thilda Karlsson
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vancouver, BC V5Z 3N9, Canada
- Faculty of Medicine, Linköping University, 582 25 Linköping, Sweden
| | - Heidi Schaefer
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vancouver, BC V5Z 3N9, Canada
| | - Jason J. S. Barton
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vancouver, BC V5Z 3N9, Canada
- Correspondence: ; Tel.: +604-875-4339; Fax: +604-875-4302
| | - Sherryse L. Corrow
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vancouver, BC V5Z 3N9, Canada
- Department of Psychology, Bethel University, St. Paul, MN 55112, USA
| |
Collapse
|
6
|
Fransson S, Corrow S, Yeung S, Schaefer H, Barton JJS. Effects of Faces and Voices on the Encoding of Biographic Information. Brain Sci 2022; 12:brainsci12121716. [PMID: 36552175 PMCID: PMC9775626 DOI: 10.3390/brainsci12121716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/10/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
There are multiple forms of knowledge about people. Whether diverse person-related data interact is of interest regarding the more general issue of integration of multi-source information about the world. Our goal was to examine whether perception of a person's face or voice enhanced the encoding of their biographic data. We performed three experiments. In the first experiment, subjects learned the biographic data of a character with or without a video clip of their face. In the second experiment, they learned the character's data with an audio clip of either a generic narrator's voice or the character's voice relating the same biographic information. In the third experiment, an audiovisual clip of both the face and voice of either a generic narrator or the character accompanied the learning of biographic data. After learning, a test phase presented biographic data alone, and subjects were tested first for familiarity and second for matching of biographic data to the name. The results showed equivalent learning of biographic data across all three experiments, and none showed evidence that a character's face or voice enhanced the learning of biographic information. We conclude that the simultaneous processing of perceptual representations of people may not modulate the encoding of biographic data.
Collapse
Affiliation(s)
- Sarah Fransson
- Faculty of Medicine, Linköping University, 581 83 Linköping, Sweden
| | - Sherryse Corrow
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vanacouver, BC V5Z 3N9, Canada
- Department of Psychology, Bethel University, St. Paul, MN 55112, USA
| | - Shanna Yeung
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vanacouver, BC V5Z 3N9, Canada
| | - Heidi Schaefer
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vanacouver, BC V5Z 3N9, Canada
| | - Jason J. S. Barton
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vanacouver, BC V5Z 3N9, Canada
- Correspondence: ; Tel.: +1-604-875-4339; Fax: +1-604-875-4302
| |
Collapse
|
7
|
Lavan N, Collins MRN, Miah JFM. Audiovisual identity perception from naturally-varying stimuli is driven by visual information. Br J Psychol 2021; 113:248-263. [PMID: 34490897 DOI: 10.1111/bjop.12531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/19/2021] [Indexed: 11/30/2022]
Abstract
Identity perception often takes place in multimodal settings, where perceivers have access to both visual (face) and auditory (voice) information. Despite this, identity perception is usually studied in unimodal contexts, where face and voice identity perception are modelled independently from one another. In this study, we asked whether and how much auditory and visual information contribute to audiovisual identity perception from naturally-varying stimuli. In a between-subjects design, participants completed an identity sorting task with either dynamic video-only, audio-only or dynamic audiovisual stimuli. In this task, participants were asked to sort multiple, naturally-varying stimuli from three different people by perceived identity. We found that identity perception was more accurate for video-only and audiovisual stimuli compared with audio-only stimuli. Interestingly, there was no difference in accuracy between video-only and audiovisual stimuli. Auditory information nonetheless played a role alongside visual information as audiovisual identity judgements per stimulus could be predicted from both auditory and visual identity judgements, respectively. While the relationship was stronger for visual information and audiovisual information, auditory information still uniquely explained a significant portion of the variance in audiovisual identity judgements. Our findings thus align with previous theoretical and empirical work that proposes that, compared with faces, voices are an important but relatively less salient and a weaker cue to identity perception. We expand on this work to show that, at least in the context of this study, having access to voices in addition to faces does not result in better identity perception accuracy.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| | - Madeleine Rose Niamh Collins
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| | - Jannatul Firdaus Monisha Miah
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| |
Collapse
|
8
|
Unimodal and cross-modal identity judgements using an audio-visual sorting task: Evidence for independent processing of faces and voices. Mem Cognit 2021; 50:216-231. [PMID: 34254274 PMCID: PMC8763756 DOI: 10.3758/s13421-021-01198-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2021] [Indexed: 11/18/2022]
Abstract
Unimodal and cross-modal information provided by faces and voices contribute to identity percepts. To examine how these sources of information interact, we devised a novel audio-visual sorting task in which participants were required to group video-only and audio-only clips into two identities. In a series of three experiments, we show that unimodal face and voice sorting were more accurate than cross-modal sorting: While face sorting was consistently most accurate followed by voice sorting, cross-modal sorting was at chancel level or below. In Experiment 1, we compared performance in our novel audio-visual sorting task to a traditional identity matching task, showing that unimodal and cross-modal identity perception were overall moderately more accurate than the traditional identity matching task. In Experiment 2, separating unimodal from cross-modal sorting led to small improvements in accuracy for unimodal sorting, but no change in cross-modal sorting performance. In Experiment 3, we explored the effect of minimal audio-visual training: Participants were shown a clip of the two identities in conversation prior to completing the sorting task. This led to small, nonsignificant improvements in accuracy for unimodal and cross-modal sorting. Our results indicate that unfamiliar face and voice perception operate relatively independently with no evidence of mutual benefit, suggesting that extracting reliable cross-modal identity information is challenging.
Collapse
|
9
|
Huestegge SM, Raettig T. Crossing Gender Borders: Bidirectional Dynamic Interaction Between Face-Based and Voice-Based Gender Categorization. J Voice 2020; 34:487.e1-487.e9. [DOI: 10.1016/j.jvoice.2018.09.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 09/25/2018] [Accepted: 09/25/2018] [Indexed: 10/28/2022]
|
10
|
Stevenage SV, Symons AE, Fletcher A, Coen C. Sorting through the impact of familiarity when processing vocal identity: Results from a voice sorting task. Q J Exp Psychol (Hove) 2019; 73:519-536. [PMID: 31658884 PMCID: PMC7074657 DOI: 10.1177/1747021819888064] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The present article reports on one experiment designed to examine the importance of familiarity when processing vocal identity. A voice sorting task was used with participants who were either personally familiar or unfamiliar with three speakers. The results suggested that familiarity supported both an ability to tell different instances of the same voice together, and to tell similar instances of different voices apart. In addition, the results suggested differences between the three speakers in terms of the extent to which they were confusable, underlining the importance of vocal characteristics and stimulus selection within behavioural tasks. The results are discussed with reference to existing debates regarding the nature of stored representations as familiarity develops, and the difficulty when processing voices over faces more generally.
Collapse
Affiliation(s)
| | - Ashley E Symons
- School of Psychology, University of Southampton, Southampton, UK
| | - Abi Fletcher
- School of Psychology, University of Southampton, Southampton, UK
| | - Chantelle Coen
- School of Psychology, University of Southampton, Southampton, UK
| |
Collapse
|
11
|
Li Y, Wang F, Chen Y, Cichocki A, Sejnowski T. The Effects of Audiovisual Inputs on Solving the Cocktail Party Problem in the Human Brain: An fMRI Study. Cereb Cortex 2019; 28:3623-3637. [PMID: 29029039 DOI: 10.1093/cercor/bhx235] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Indexed: 11/13/2022] Open
Abstract
At cocktail parties, our brains often simultaneously receive visual and auditory information. Although the cocktail party problem has been widely investigated under auditory-only settings, the effects of audiovisual inputs have not. This study explored the effects of audiovisual inputs in a simulated cocktail party. In our fMRI experiment, each congruent audiovisual stimulus was a synthesis of 2 facial movie clips, each of which could be classified into 1 of 2 emotion categories (crying and laughing). Visual-only (faces) and auditory-only stimuli (voices) were created by extracting the visual and auditory contents from the synthesized audiovisual stimuli. Subjects were instructed to selectively attend to 1 of the 2 objects contained in each stimulus and to judge its emotion category in the visual-only, auditory-only, and audiovisual conditions. The neural representations of the emotion features were assessed by calculating decoding accuracy and brain pattern-related reproducibility index based on the fMRI data. We compared the audiovisual condition with the visual-only and auditory-only conditions and found that audiovisual inputs enhanced the neural representations of emotion features of the attended objects instead of the unattended objects. This enhancement might partially explain the benefits of audiovisual inputs for the brain to solve the cocktail party problem.
Collapse
Affiliation(s)
- Yuanqing Li
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou, China.,Guangzhou Key Laboratory of Brain Computer Interaction and Applications, Guangzhou, China
| | - Fangyi Wang
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou, China.,Guangzhou Key Laboratory of Brain Computer Interaction and Applications, Guangzhou, China
| | - Yongbin Chen
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou, China.,Guangzhou Key Laboratory of Brain Computer Interaction and Applications, Guangzhou, China
| | - Andrzej Cichocki
- Riken Brain Science Institute, Wako shi, Japan.,Skolkovo Institute of Science and Technology (SKOTECH), Moscow, Russia
| | - Terrence Sejnowski
- Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| |
Collapse
|
12
|
Mileva M, Tompkinson J, Watt D, Burton AM. The Role of Face and Voice Cues in Predicting the Outcome of Student Representative Elections. PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN 2019; 46:617-625. [PMID: 31409219 DOI: 10.1177/0146167219867965] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
First impressions formed after seeing someone's face or hearing their voice can affect many social decisions, including voting in political elections. Despite the many studies investigating the independent contribution of face and voice cues to electoral success, their integration is still not well understood. Here, we examine a novel electoral context, student representative ballots, allowing us to test the generalizability of previous studies. We also examine the independent contributions of visual, auditory, and audiovisual information to social judgments of the candidates, and their relationship to election outcomes. Results showed that perceived trustworthiness was the only trait significantly related to election success. These findings contrast with previous reports on the importance of perceived competence using audio or visual cues only in the context of national political elections. The present study highlights the role of real-world context and emphasizes the importance of using ecologically valid stimulus presentation in understanding real-life social judgment.
Collapse
|
13
|
Abstract
Based on current integration theories of face-voice processing, the present study had participants process 1,152 videos of faces uttering digits. Half of the videos contained face-voice gender-incongruent stimuli (vs. congruent stimuli in the other half). Participants indicated digit magnitude or parity. Tasks were presented in pure blocks (only 1 task) and in task switching blocks (using colored cues to specify task). The results indicate significant congruency effects in pure blocks, but partially reversed congruency effects in task switching, probably due to enhanced assignment of capacity toward resolving difficult situational demands. Congruency effects did not dissipate over time, ruling out that initial surprise associated with incongruent stimuli drove the effects. The results show that interference between two task-irrelevant person-related dimensions (face/voice gender) can affect processing of a third, task-relevant dimension (digit identity), suggesting greater processing ease associated with more authentic voices (i.e., voices that do not violate face-based expectancies).
Collapse
Affiliation(s)
- Sujata M Huestegge
- 1 Department of Special Education and Speech-Language Pathology, University of Würzburg, Germany.,2 Institute of Voice and Performing Arts, University of Music and Performing Arts Munich, Germany
| | - Tim Raettig
- 3 Institute of Psychology, University of Würzburg, Germany
| | - Lynn Huestegge
- 3 Institute of Psychology, University of Würzburg, Germany
| |
Collapse
|
14
|
Mühl C, Sheil O, Jarutytė L, Bestelmeyer PEG. The Bangor Voice Matching Test: A standardized test for the assessment of voice perception ability. Behav Res Methods 2018; 50:2184-2192. [PMID: 29124718 PMCID: PMC6267520 DOI: 10.3758/s13428-017-0985-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Recognising the identity of conspecifics is an important yet highly variable skill. Approximately 2 % of the population suffers from a socially debilitating deficit in face recognition. More recently the existence of a similar deficit in voice perception has emerged (phonagnosia). Face perception tests have been readily available for years, advancing our understanding of underlying mechanisms in face perception. In contrast, voice perception has received less attention, and the construction of standardized voice perception tests has been neglected. Here we report the construction of the first standardized test for voice perception ability. Participants make a same/different identity decision after hearing two voice samples. Item Response Theory guided item selection to ensure the test discriminates between a range of abilities. The test provides a starting point for the systematic exploration of the cognitive and neural mechanisms underlying voice perception. With a high test-retest reliability (r=.86) and short assessment duration (~10 min) this test examines individual abilities reliably and quickly and therefore also has potential for use in developmental and neuropsychological populations.
Collapse
Affiliation(s)
- Constanze Mühl
- School of Psychology, Bangor University, Brigantia Building, Penrallt Road, Bangor, Gwynedd, LL57 2AS, UK
| | - Orla Sheil
- School of Psychology, Bangor University, Brigantia Building, Penrallt Road, Bangor, Gwynedd, LL57 2AS, UK
| | - Lina Jarutytė
- School of Experimental Psychology, University of Bristol, Bristol, BS8 1TU, UK
| | - Patricia E G Bestelmeyer
- School of Psychology, Bangor University, Brigantia Building, Penrallt Road, Bangor, Gwynedd, LL57 2AS, UK.
| |
Collapse
|
15
|
Spokespersons’ Nonverbal Behavior in Times of Crisis: The Relative Importance of Visual and Vocal Cues. JOURNAL OF NONVERBAL BEHAVIOR 2018. [DOI: 10.1007/s10919-018-0284-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Maguinness C, Roswandowitz C, von Kriegstein K. Understanding the mechanisms of familiar voice-identity recognition in the human brain. Neuropsychologia 2018; 116:179-193. [DOI: 10.1016/j.neuropsychologia.2018.03.039] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Revised: 03/28/2018] [Accepted: 03/29/2018] [Indexed: 11/26/2022]
|
17
|
Stevenage SV. Drawing a distinction between familiar and unfamiliar voice processing: A review of neuropsychological, clinical and empirical findings. Neuropsychologia 2017; 116:162-178. [PMID: 28694095 DOI: 10.1016/j.neuropsychologia.2017.07.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 06/04/2017] [Accepted: 07/07/2017] [Indexed: 11/29/2022]
Abstract
Thirty years on from their initial observation that familiar voice recognition is not the same as unfamiliar voice discrimination (van Lancker and Kreiman, 1987), the current paper reviews available evidence in support of a distinction between familiar and unfamiliar voice processing. Here, an extensive review of the literature is provided, drawing on evidence from four domains of interest: the neuropsychological study of healthy individuals, neuropsychological investigation of brain-damaged individuals, the exploration of voice recognition deficits in less commonly studied clinical conditions, and finally empirical data from healthy individuals. All evidence is assessed in terms of its contribution to the question of interest - is familiar voice processing distinct from unfamiliar voice processing. In this regard, the evidence provides compelling support for van Lancker and Kreiman's early observation. Two considerations result: First, the limits of research based on one or other type of voice stimulus are more clearly appreciated. Second, given the demonstration of a distinction between unfamiliar and familiar voice processing, a new wave of research is encouraged which examines the transition involved as a voice is learned.
Collapse
Affiliation(s)
- Sarah V Stevenage
- Department of Psychology, University of Southampton, Highfield, Southampton, Hampshire SO17 1BJ, UK.
| |
Collapse
|
18
|
Bao JY, Corrow SL, Schaefer H, Barton JJS. Cross-modal interactions of faces, voices and names in person familiarity. VISUAL COGNITION 2017. [DOI: 10.1080/13506285.2017.1329763] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Jing Ye Bao
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vancouver, Canada
| | - Sherryse L. Corrow
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vancouver, Canada
| | - Heidi Schaefer
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vancouver, Canada
| | - Jason J. S. Barton
- Human Vision and Eye Movement Laboratory, Departments of Medicine (Neurology), Ophthalmology and Visual Sciences, Psychology, University of British Columbia, Vancouver, Canada
| |
Collapse
|
19
|
Bülthoff I, Newell FN. Crossmodal priming of unfamiliar faces supports early interactions between voices and faces in person perception. VISUAL COGNITION 2017. [DOI: 10.1080/13506285.2017.1290729] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
| | - Fiona N. Newell
- School of Psychology and Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| |
Collapse
|
20
|
Affiliation(s)
- Stefan R. Schweinberger
- Department of General Psychology, Friedrich Schiller University and DFG Research Unit Person Perception, Jena, Germany
| | - David M.C. Robertson
- Department of General Psychology, Friedrich Schiller University and DFG Research Unit Person Perception, Jena, Germany
| |
Collapse
|
21
|
Hölig C, Föcker J, Best A, Röder B, Büchel C. Activation in the angular gyrus and in the pSTS is modulated by face primes during voice recognition. Hum Brain Mapp 2017; 38:2553-2565. [PMID: 28218433 DOI: 10.1002/hbm.23540] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Revised: 12/23/2016] [Accepted: 02/06/2017] [Indexed: 11/08/2022] Open
Abstract
The aim of the present study was to better understand the interaction of face and voice processing when identifying people. In a S1-S2 crossmodal priming fMRI experiment, the target (S2) was a disyllabic voice stimulus, whereas the modality of the prime (S1) was manipulated blockwise and consisted of the silent video of a speaking face in the crossmodal condition or of a voice stimulus in the unimodal condition. Primes and targets were from the same speaker (person-congruent) or from two different speakers (person-incongruent). Participants had to classify the S2 as either an old or a young person. Response times were shorter after a congruent than after an incongruent face prime. The right posterior superior temporal sulcus (pSTS) and the right angular gyrus showed a significant person identity effect (person-incongruent > person-congruent) in the crossmodal condition but not in the unimodal condition. In the unimodal condition, a person identity effect was observed in the bilateral inferior frontal gyrus. Our data suggest that both the priming with a voice and with a face result in a preactivated voice representation of the respective person, which eventually facilitates (person-congruent trials) or hampers (person-incongruent trials) the processing of the identity of a subsequent voice. This process involves activation in the right pSTS and in the right angular gyrus for voices primed by faces, but not for voices primed by voices. Hum Brain Mapp 38:2553-2565, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Cordula Hölig
- Biological Psychology and Neuropsychology, University of Hamburg, Germany.,Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Germany
| | - Julia Föcker
- Department of Psychology, Ludwig Maximilian University, Munich, Germany
| | - Anna Best
- Biological Psychology and Neuropsychology, University of Hamburg, Germany
| | - Brigitte Röder
- Biological Psychology and Neuropsychology, University of Hamburg, Germany
| | - Christian Büchel
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Germany
| |
Collapse
|
22
|
Stevenage SV, Hamlin I, Ford B. Distinctiveness helps when matching static faces and voices. JOURNAL OF COGNITIVE PSYCHOLOGY 2016. [DOI: 10.1080/20445911.2016.1272605] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Sarah V. Stevenage
- Department of Psychology, University of Southampton, Highfield, Southampton, UK
| | - Iain Hamlin
- Department of Psychology, University of Southampton, Highfield, Southampton, UK
| | - Becky Ford
- Department of Psychology, University of Southampton, Highfield, Southampton, UK
| |
Collapse
|
23
|
Peschard V, Philippot P, Gilboa-Schechtman E. Involuntary processing of social dominance cues from bimodal face-voice displays. Cogn Emot 2016; 32:13-23. [PMID: 28000541 DOI: 10.1080/02699931.2016.1266304] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Social-rank cues communicate social status or social power within and between groups. Information about social-rank is fluently processed in both visual and auditory modalities. So far, the investigation on the processing of social-rank cues has been limited to studies in which information from a single modality was assessed or manipulated. Yet, in everyday communication, multiple information channels are used to express and understand social-rank. We sought to examine the (in)voluntary nature of processing of facial and vocal signals of social-rank using a cross-modal Stroop task. In two experiments, participants were presented with face-voice pairs that were either congruent or incongruent in social-rank (i.e. social dominance). Participants' task was to label face social dominance while ignoring the voice, or label voice social dominance while ignoring the face. In both experiments, we found that face-voice incongruent stimuli were processed more slowly and less accurately than were the congruent stimuli in the face-attend and the voice-attend tasks, exhibiting classical Stroop-like effects. These findings are consistent with the functioning of a social-rank bio-behavioural system which consistently and automatically monitors one's social standing in relation to others and uses that information to guide behaviour.
Collapse
Affiliation(s)
- Virginie Peschard
- a Psychology Department and the Gonda Brain Science Center , Bar Ilan University , Ramat Gan , Israel.,b Laboratory for Experimental Psychopathology , Psychological Sciences Research Institute, Université Catholique de Louvain , Louvain-la-Neuve , Belgium
| | - Pierre Philippot
- b Laboratory for Experimental Psychopathology , Psychological Sciences Research Institute, Université Catholique de Louvain , Louvain-la-Neuve , Belgium
| | - Eva Gilboa-Schechtman
- a Psychology Department and the Gonda Brain Science Center , Bar Ilan University , Ramat Gan , Israel
| |
Collapse
|
24
|
Awwad Shiekh Hasan B, Valdes-Sosa M, Gross J, Belin P. "Hearing faces and seeing voices": Amodal coding of person identity in the human brain. Sci Rep 2016; 6:37494. [PMID: 27881866 PMCID: PMC5121604 DOI: 10.1038/srep37494] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Accepted: 10/27/2016] [Indexed: 11/09/2022] Open
Abstract
Recognizing familiar individuals is achieved by the brain by combining cues from several sensory modalities, including the face of a person and her voice. Here we used functional magnetic resonance (fMRI) and a whole-brain, searchlight multi-voxel pattern analysis (MVPA) to search for areas in which local fMRI patterns could result in identity classification as a function of sensory modality. We found several areas supporting face or voice stimulus classification based on fMRI responses, consistent with previous reports; the classification maps overlapped across modalities in a single area of right posterior superior temporal sulcus (pSTS). Remarkably, we also found several cortical areas, mostly located along the middle temporal gyrus, in which local fMRI patterns resulted in identity “cross-classification”: vocal identity could be classified based on fMRI responses to the faces, or the reverse, or both. These findings are suggestive of a series of cortical identity representations increasingly abstracted from the input modality.
Collapse
Affiliation(s)
- Bashar Awwad Shiekh Hasan
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom.,Institute of Neuroscience, Newcastle University, Newcastle, United Kingdom
| | | | - Joachim Gross
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Pascal Belin
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom.,Département de Psychologie, Université de Montréal, Montréal, Québec, Canada.,Institut de Neurosciecnes de la Timone, UMR 7289, CNRS and Aix-Marseille Université, Marseille, France
| |
Collapse
|
25
|
Tomlin RJ, Stevenage SV, Hammond S. Putting the pieces together: Revealing face–voice integration through the facial overshadowing effect. VISUAL COGNITION 2016. [DOI: 10.1080/13506285.2016.1245230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Rebecca J. Tomlin
- Department of Psychology, University of Southampton, Southampton, UK
| | | | - Sarah Hammond
- Department of Psychology, University of Southampton, Southampton, UK
| |
Collapse
|
26
|
Smith HMJ, Dunn AK, Baguley T, Stacey PC. The effect of inserting an inter-stimulus interval in face-voice matching tasks. Q J Exp Psychol (Hove) 2016; 71:424-434. [PMID: 27784196 DOI: 10.1080/17470218.2016.1253758] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Voices and static faces can be matched for identity above chance level. No previous face-voice matching experiments have included an inter-stimulus interval (ISI) exceeding 1 s. We tested whether accurate identity decisions rely on high-quality perceptual representations temporarily stored in sensory memory, and therefore whether the ability to make accurate matching decisions diminishes as the ISI increases. In each trial, participants had to decide whether an unfamiliar face and voice belonged to the same person. The face and voice stimuli were presented simultaneously in Experiment 1, and there was a 5-s ISI in Experiment 2, and a 10-s interval in Experiment 3. The results, analysed using multilevel modelling, revealed that static face-voice matching was significantly above chance level only when the stimuli were presented simultaneously (Experiment 1). The overall bias to respond same identity weakened as the interval increased, suggesting that this bias is explained by temporal contiguity. Taken together, the findings highlight that face-voice matching performance is reliant on comparing fast-decaying, high-quality perceptual representations. The results are discussed in terms of social functioning.
Collapse
Affiliation(s)
| | - Andrew K Dunn
- Psychology Division, Nottingham Trent University, Nottingham, UK
| | - Thom Baguley
- Psychology Division, Nottingham Trent University, Nottingham, UK
| | - Paula C Stacey
- Psychology Division, Nottingham Trent University, Nottingham, UK
| |
Collapse
|
27
|
Yovel G, O’Toole AJ. Recognizing People in Motion. Trends Cogn Sci 2016; 20:383-395. [DOI: 10.1016/j.tics.2016.02.005] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2016] [Revised: 02/18/2016] [Accepted: 02/18/2016] [Indexed: 11/15/2022]
|
28
|
Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention. Sci Rep 2016; 6:18914. [PMID: 26759193 PMCID: PMC4725371 DOI: 10.1038/srep18914] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 11/30/2015] [Indexed: 11/23/2022] Open
Abstract
An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.
Collapse
|
29
|
Zäske R, Mühl C, Schweinberger SR. Benefits for Voice Learning Caused by Concurrent Faces Develop over Time. PLoS One 2015; 10:e0143151. [PMID: 26588847 PMCID: PMC4654504 DOI: 10.1371/journal.pone.0143151] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 10/30/2015] [Indexed: 11/19/2022] Open
Abstract
Recognition of personally familiar voices benefits from the concurrent presentation of the corresponding speakers’ faces. This effect of audiovisual integration is most pronounced for voices combined with dynamic articulating faces. However, it is unclear if learning unfamiliar voices also benefits from audiovisual face-voice integration or, alternatively, is hampered by attentional capture of faces, i.e., “face-overshadowing”. In six study-test cycles we compared the recognition of newly-learned voices following unimodal voice learning vs. bimodal face-voice learning with either static (Exp. 1) or dynamic articulating faces (Exp. 2). Voice recognition accuracies significantly increased for bimodal learning across study-test cycles while remaining stable for unimodal learning, as reflected in numerical costs of bimodal relative to unimodal voice learning in the first two study-test cycles and benefits in the last two cycles. This was independent of whether faces were static images (Exp. 1) or dynamic videos (Exp. 2). In both experiments, slower reaction times to voices previously studied with faces compared to voices only may result from visual search for faces during memory retrieval. A general decrease of reaction times across study-test cycles suggests facilitated recognition with more speaker repetitions. Overall, our data suggest two simultaneous and opposing mechanisms during bimodal face-voice learning: while attentional capture of faces may initially impede voice learning, audiovisual integration may facilitate it thereafter.
Collapse
Affiliation(s)
- Romi Zäske
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University of Jena, Jena, Germany
- * E-mail:
| | - Constanze Mühl
- School of Psychology, Bangor University, Bangor, Gwynedd, Wales, United Kingdom
| | - Stefan R. Schweinberger
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University of Jena, Jena, Germany
| |
Collapse
|
30
|
Gainotti G. Implications of recent findings for current cognitive models of familiar people recognition. Neuropsychologia 2015; 77:279-87. [DOI: 10.1016/j.neuropsychologia.2015.09.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Revised: 08/17/2015] [Accepted: 09/02/2015] [Indexed: 11/30/2022]
|
31
|
Abstract
Voices provide a rich source of information that is important for identifying individuals and for social interaction. During search for a face in a crowd, voices often accompany visual information, and they facilitate localization of the sought-after individual. However, it is unclear whether this facilitation occurs primarily because the voice cues the location of the face or because it also increases the salience of the associated face. Here we demonstrate that a voice that provides no location information nonetheless facilitates visual search for an associated face. We trained novel face-voice associations and verified learning using a two-alternative forced choice task in which participants had to correctly match a presented voice to the associated face. Following training, participants searched for a previously learned target face among other faces while hearing one of the following sounds (localized at the center of the display): a congruent learned voice, an incongruent but familiar voice, an unlearned and unfamiliar voice, or a time-reversed voice. Only the congruent learned voice speeded visual search for the associated face. This result suggests that voices facilitate the visual detection of associated faces, potentially by increasing their visual salience, and that the underlying crossmodal associations can be established through brief training.
Collapse
Affiliation(s)
- L Jacob Zweig
- Department of Psychology, Northwestern University, Evanston, IL, USA,
| | | | | |
Collapse
|
32
|
Bülthoff I, Newell FN. Distinctive voices enhance the visual recognition of unfamiliar faces. Cognition 2015; 137:9-21. [PMID: 25584464 DOI: 10.1016/j.cognition.2014.12.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Revised: 12/16/2014] [Accepted: 12/18/2014] [Indexed: 11/16/2022]
Abstract
Several studies have provided evidence in favour of a norm-based representation of faces in memory. However, such models have hitherto failed to take account of how other person-relevant information affects face recognition performance. Here we investigated whether distinctive or typical auditory stimuli affect the subsequent recognition of previously unfamiliar faces and whether the type of auditory stimulus matters. In this study participants learned to associate either unfamiliar distinctive and typical voices or unfamiliar distinctive and typical sounds to unfamiliar faces. The results indicated that recognition performance was better to faces previously paired with distinctive than with typical voices but we failed to find any benefit on face recognition when the faces were previously associated with distinctive sounds. These findings possibly point to an expertise effect, as faces are usually associated to voices. More importantly, it suggests that the memory for visual faces can be modified by the perceptual quality of related vocal information and more specifically that facial distinctiveness can be of a multi-sensory nature. These results have important implications for our understanding of the structure of memory for person identification.
Collapse
Affiliation(s)
- I Bülthoff
- Max Planck Institute for Biological Cybernetics, Spemannstr. 38, D-72076 Tübingen, Germany.
| | - F N Newell
- School of Psychology and Institute of Neuroscience, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
| |
Collapse
|
33
|
Evidence for a supra-modal representation of emotion from cross-modal adaptation. Cognition 2015; 134:245-51. [DOI: 10.1016/j.cognition.2014.11.001] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Revised: 08/27/2014] [Accepted: 11/02/2014] [Indexed: 11/22/2022]
|
34
|
Abstract
Soziale Wahrnehmung, Kommunikation und Interaktion erfordern eine effiziente Analyse und Repräsentation personenbezogener Informationen. Dabei transportieren vor allem Gesichter und Stimmen eine Vielzahl sozial relevanter Informationen, etwa über die Identität einer Person, Emotionen, Geschlecht, Alter, Attraktivität, ethnische Zugehörigkeit oder momentaner Aufmerksamkeitsfokus. Trotz dieses Wissens sind die perzeptuellen Mechanismen der Wahrnehmung komplexer sozialer Stimuli erst in den letzten Jahren systematischer untersucht worden. Diese Entwicklung wurde vorallem durch (1) die Verfügbarkeit sophistizierter Reizmanipulationstechniken (z. B. Bild-, Video- und Stimmen-Morphing, Karikierung, und Mittelungstechniken) sowie (2) die Verfügbarkeit von Messmethoden der kognitiven und sozialen Neurowissenschaften ermöglicht. In diesem Artikel fassen wir den aktuellen Forschungsstand der Wahrnehmung von Personen, besonders bezüglich Gesichter und Stimmen, zusammen. Dabei diskutieren wir ausgewählte Beispiele aktueller Forschung, und legen dar, wie sich die Personenwahrnehmung zu einem zentralen Thema psychologischer Forschung entwickelt hat. Neue Evidenz zeigt, dass sozial relevante perzeptuelle Informationen in Gesichtern oder Stimmen nicht nur erste Eindrücke über Personen erzeugen, sondern dass diese Eindrücke auch moderate Validität aufweisen, so dass Gesichter oder Stimmen als „Fenster zur Person” betrachtet werden können. Wir argumentieren, dass weitere Fortschritte in anderen Feldern der sozialen Kognitionsforschung, welche reale oder virtuelle Agenten berücksichtigen (z. B. Theory of Mind Forschung, soziale Kategorisierung, menschliche Entscheidungen) von einer Berücksichtigung fazialer oder stimmlicher Informationen in der Personenwahrnehmung profitieren.
Collapse
|
35
|
Gao Y, Cao S, Qu T, Wu X, Li H, Zhang J, Li L. Voice-associated static face image releases speech from informational masking. Psych J 2014; 3:113-20. [PMID: 26271763 DOI: 10.1002/pchj.45] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 11/07/2013] [Indexed: 11/08/2022]
Abstract
In noisy, multipeople talking environments such as a cocktail party, listeners can use various perceptual and/or cognitive cues to improve recognition of target speech against masking, particularly informational masking. Previous studies have shown that temporally prepresented voice cues (voice primes) improve recognition of target speech against speech masking but not noise masking. This study investigated whether static face image primes that have become target-voice associated (i.e., facial images linked through associative learning with voices reciting the target speech) can be used by listeners to unmask speech. The results showed that in 32 normal-hearing younger adults, temporally prepresenting a voice-priming sentence with the same voice reciting the target sentence significantly improved the recognition of target speech that was masked by irrelevant two-talker speech. When a person's face photograph image became associated with the voice reciting the target speech by learning, temporally prepresenting the target-voice-associated face image significantly improved recognition of target speech against speech masking, particularly for the last two keywords in the target sentence. Moreover, speech-recognition performance under the voice-priming condition was significantly correlated to that under the face-priming condition. The results suggest that learned facial information on talker identity plays an important role in identifying the target-talker's voice and facilitating selective attention to the target-speech stream against the masking-speech stream.
Collapse
Affiliation(s)
- Yayue Gao
- Department of Psychology, Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China
| | - Shuyang Cao
- Department of Psychology, Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China.,State Administration of Press, Publication, Radio, Film and Television of The People's Republic of China, Beijing
| | - Tianshu Qu
- Department of Psychology, Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China
| | - Xihong Wu
- Department of Psychology, Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China
| | - Haifeng Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jinsheng Zhang
- Department of Otolaryngology-Head and Neck Surgery, Wayne State University School of Medicine, Detroit, Michigan, USA
| | - Liang Li
- Department of Psychology, Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China
| |
Collapse
|
36
|
Hearing Faces and Seeing Voices: The Integration and Interaction of Face and Voice Processing. Psychol Belg 2014. [DOI: 10.5334/pb.ar] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
37
|
|
38
|
Schweinberger SR, Kawahara H, Simpson AP, Skuk VG, Zäske R. Speaker perception. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2013; 5:15-25. [DOI: 10.1002/wcs.1261] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Revised: 08/14/2013] [Accepted: 08/29/2013] [Indexed: 11/08/2022]
Affiliation(s)
- Stefan R. Schweinberger
- Department of General Psychology and Cognitive Neuroscience; Institute of Psychology, Friedrich Schiller University; Jena Germany
- DFG Research Unit Person Perception; Friedrich Schiller University; Jena Germany
| | - Hideki Kawahara
- Faculty of Systems Engineering; Wakayama University; Wakayama Japan
| | - Adrian P. Simpson
- DFG Research Unit Person Perception; Friedrich Schiller University; Jena Germany
- Department of Speech; Institute of German Linguistics, Friedrich Schiller University; Jena Germany
| | - Verena G. Skuk
- Department of General Psychology and Cognitive Neuroscience; Institute of Psychology, Friedrich Schiller University; Jena Germany
- DFG Research Unit Person Perception; Friedrich Schiller University; Jena Germany
| | - Romi Zäske
- Department of General Psychology and Cognitive Neuroscience; Institute of Psychology, Friedrich Schiller University; Jena Germany
- DFG Research Unit Person Perception; Friedrich Schiller University; Jena Germany
| |
Collapse
|
39
|
Li Y, Long J, Huang B, Yu T, Wu W, Liu Y, Liang C, Sun P. Crossmodal integration enhances neural representation of task-relevant features in audiovisual face perception. Cereb Cortex 2013; 25:384-95. [PMID: 23978654 DOI: 10.1093/cercor/bht228] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Previous studies have shown that audiovisual integration improves identification performance and enhances neural activity in heteromodal brain areas, for example, the posterior superior temporal sulcus/middle temporal gyrus (pSTS/MTG). Furthermore, it has also been demonstrated that attention plays an important role in crossmodal integration. In this study, we considered crossmodal integration in audiovisual facial perception and explored its effect on the neural representation of features. The audiovisual stimuli in the experiment consisted of facial movie clips that could be classified into 2 gender categories (male vs. female) or 2 emotion categories (crying vs. laughing). The visual/auditory-only stimuli were created from these movie clips by removing the auditory/visual contents. The subjects needed to make a judgment about the gender/emotion category for each movie clip in the audiovisual, visual-only, or auditory-only stimulus condition as functional magnetic resonance imaging (fMRI) signals were recorded. The neural representation of the gender/emotion feature was assessed using the decoding accuracy and the brain pattern-related reproducibility indices, obtained by a multivariate pattern analysis method from the fMRI data. In comparison to the visual-only and auditory-only stimulus conditions, we found that audiovisual integration enhanced the neural representation of task-relevant features and that feature-selective attention might play a role of modulation in the audiovisual integration.
Collapse
Affiliation(s)
- Yuanqing Li
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou 510640, China
| | - Jinyi Long
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou 510640, China
| | - Biao Huang
- Department of Radiology, Guangdong General Hospital, Guangzhou 510080, China
| | - Tianyou Yu
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou 510640, China
| | - Wei Wu
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou 510640, China
| | - Yongjian Liu
- Department of MR, Foshan Hospital of Traditional Chinese Medicine, Foshan 528000, China
| | - Changhong Liang
- Department of Radiology, Guangdong General Hospital, Guangzhou 510080, China
| | - Pei Sun
- Department of Psychology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
40
|
Stevenage SV, Neil GJ, Hamlin I. When the face fits: Recognition of celebrities from matching and mismatching faces and voices. Memory 2013; 22:284-94. [DOI: 10.1080/09658211.2013.781654] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
41
|
Stevenage SV, Hale S, Morgan Y, Neil GJ. Recognition by association: Within- and cross-modality associative priming with faces and voices. Br J Psychol 2012; 105:1-16. [DOI: 10.1111/bjop.12011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Revised: 09/18/2012] [Indexed: 12/01/2022]
Affiliation(s)
| | - Sarah Hale
- School of Psychology; University of Southampton; Hampshire UK
| | - Yasmin Morgan
- School of Psychology; University of Southampton; Hampshire UK
| | - Greg J. Neil
- School of Psychology; University of Southampton; Hampshire UK
| |
Collapse
|
42
|
Schweinberger SR, Kloth N, Robertson DM. Hearing facial identities: Brain correlates of face–voice integration in person identification. Cortex 2011; 47:1026-37. [DOI: 10.1016/j.cortex.2010.11.011] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2010] [Revised: 11/16/2010] [Accepted: 11/22/2010] [Indexed: 11/24/2022]
|
43
|
von Kriegstein K. A Multisensory Perspective on Human Auditory Communication. Front Neurosci 2011. [DOI: 10.1201/b11092-43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
44
|
|
45
|
Person identification through faces and voices: An ERP study. Brain Res 2011; 1407:13-26. [DOI: 10.1016/j.brainres.2011.03.029] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2011] [Accepted: 03/11/2011] [Indexed: 11/17/2022]
|
46
|
Rakić T, Steffens MC, Mummendey A. When it matters how you pronounce it: The influence of regional accents on job interview outcome. Br J Psychol 2011; 102:868-83. [DOI: 10.1111/j.2044-8295.2011.02051.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
47
|
|
48
|
|
49
|
|
50
|
|