1
|
Lavan N, Sutherland CAM. Idiosyncratic and shared contributions shape impressions from voices and faces. Cognition 2024; 251:105881. [PMID: 39029363 DOI: 10.1016/j.cognition.2024.105881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 07/06/2024] [Accepted: 07/07/2024] [Indexed: 07/21/2024]
Abstract
Voices elicit rich first impressions of what the person we are hearing might be like. Research stresses that these impressions from voices are shared across different listeners, such that people on average agree which voices sound trustworthy or old and which do not. However, can impressions from voices also be shaped by the 'ear of the beholder'? We investigated whether - and how - listeners' idiosyncratic, personal preferences contribute to first impressions from voices. In two studies (993 participants, 156 voices), we find evidence for substantial idiosyncratic contributions to voice impressions using a variance portioning approach. Overall, idiosyncratic contributions were as important as shared contributions to impressions from voices for inferred person characteristics (e.g., trustworthiness, friendliness). Shared contributions were only more influential for impressions of more directly apparent person characteristics (e.g., gender, age). Both idiosyncratic and shared contributions were reduced when stimuli were limited in their (perceived) variability, suggesting that natural variation in voices is key to understanding this impression formation. When comparing voice impressions to face impressions, we found that idiosyncratic and shared contributions to impressions similarly across modality when stimulus properties are closely matched - although voice impressions were overall less consistent than face impressions. We thus reconceptualise impressions from voices as being formed not only based on shared but also idiosyncratic contributions. We use this new framing to suggest future directions of research, including understanding idiosyncratic mechanisms, development, and malleability of voice impression formation.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, United Kingdom.
| | - Clare A M Sutherland
- School of Psychology, King's College, University of Aberdeen, United Kingdom; School of Psychological Science, University of Western Australia, Australia
| |
Collapse
|
2
|
Neuenswander KL, Goodale BM, Bryant GA, Johnson KL. Sex ratios in vocal ensembles affect perceptions of threat and belonging. Sci Rep 2024; 14:14575. [PMID: 38914752 PMCID: PMC11196271 DOI: 10.1038/s41598-024-65535-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 06/20/2024] [Indexed: 06/26/2024] Open
Abstract
People often interact with groups (i.e., ensembles) during social interactions. Given that group-level information is important in navigating social environments, we expect perceptual sensitivity to aspects of groups that are relevant for personal threat as well as social belonging. Most ensemble perception research has focused on visual ensembles, with little research looking at auditory or vocal ensembles. Across four studies, we present evidence that (i) perceivers accurately extract the sex composition of a group from voices alone, (ii) judgments of threat increase concomitantly with the number of men, and (iii) listeners' sense of belonging depends on the number of same-sex others in the group. This work advances our understanding of social cognition, interpersonal communication, and ensemble coding to include auditory information, and reveals people's ability to extract relevant social information from brief exposures to vocalizing groups.
Collapse
Affiliation(s)
- Kelsey L Neuenswander
- Department of Communication, University of California, Los Angeles, 2225 Rolfe Hall, Los Angeles, CA, 90095, USA.
| | | | - Gregory A Bryant
- Department of Communication, University of California, Los Angeles, 2225 Rolfe Hall, Los Angeles, CA, 90095, USA
| | - Kerri L Johnson
- Department of Communication, University of California, Los Angeles, 2225 Rolfe Hall, Los Angeles, CA, 90095, USA
- Department of Psychology, University of California, Los Angeles, USA
| |
Collapse
|
3
|
Pautz N, McDougall K, Mueller-Johnson K, Nolan F, Paver A, Smith HMJ. Identifying unfamiliar voices: Examining the system variables of sample duration and parade size. Q J Exp Psychol (Hove) 2023; 76:2804-2822. [PMID: 36718784 PMCID: PMC10655699 DOI: 10.1177/17470218231155738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 01/13/2023] [Accepted: 01/17/2023] [Indexed: 02/01/2023]
Abstract
Voice identification parades can be unreliable due to the error-prone nature of earwitness responses. UK government guidelines recommend that voice parades should have nine voices, each played for 60 s. This makes parades resource-consuming to construct. In this article, we conducted two experiments to see if voice parade procedures could be simplified. In Experiment 1 (N = 271, 135 female), we investigated if reducing the duration of the voice samples on a nine-voice parade would negatively affect identification performance using both conventional logistic and signal detection approaches. In Experiment 2 (N = 270, 136 female), we first explored if the same sample duration conditions used in Experiment 1 would lead to different outcomes if we reduced the parade size to include only six voices. Following this, we pooled the data from both experiments to investigate the influence of target-position effects. The results show that 15-s sample durations result in statistically equivalent voice identification performance to the longer 60-s sample durations, but that the 30-s sample duration suffers in terms of overall signal sensitivity. This pattern of results was replicated using both a nine- and a six-voice parade. Performance on target-absent parades were at chance levels in both parade sizes, and response criteria were mostly liberal. In addition, unwanted position effects were present. The results provide initial evidence that the sample duration used in a voice parade may be reduced, but we argue that the guidelines recommending a parade with nine voices should be maintained to provide additional protection for a potentially innocent suspect given the low target-absent accuracy.
Collapse
|
4
|
Zadoorian S, Rosenblum LD. The Benefit of Bimodal Training in Voice Learning. Brain Sci 2023; 13:1260. [PMID: 37759861 PMCID: PMC10526927 DOI: 10.3390/brainsci13091260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/25/2023] [Accepted: 08/28/2023] [Indexed: 09/29/2023] Open
Abstract
It is known that talkers can be recognized by listening to their specific vocal qualities-breathiness and fundamental frequencies. However, talker identification can also occur by focusing on the talkers' unique articulatory style, which is known to be available auditorily and visually and can be shared across modalities. Evidence shows that voices heard while seeing talkers' faces are later recognized better on their own compared to the voices heard alone. The present study investigated whether the facilitation of voice learning through facial cues relies on talker-specific articulatory or nonarticulatory facial information. Participants were initially trained to learn the voices of ten talkers presented either on their own or together with (a) an articulating face, (b) a static face, or (c) an isolated articulating mouth. Participants were then tested on recognizing the voices on their own regardless of their training modality. Consistent with previous research, voices learned with articulating faces were recognized better on their own compared to voices learned alone. However, isolated articulating mouths did not provide an advantage in learning the voices. The results demonstrated that learning voices while seeing faces resulted in better voice learning compared to the voices learned alone.
Collapse
|
5
|
Stevenage SV, Singh L, Dixey P. The Curious Case of Impersonators and Singers: Telling Voices Apart and Telling Voices Together under Naturally Challenging Listening Conditions. Brain Sci 2023; 13:brainsci13020358. [PMID: 36831901 PMCID: PMC9954053 DOI: 10.3390/brainsci13020358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/12/2023] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Vocal identity processing depends on the ability to tell apart two instances of different speakers whilst also being able to tell together two instances of the same speaker. Whilst previous research has examined these voice processing capabilities under relatively common listening conditions, it has not yet tested the limits of these capabilities. Here, two studies are presented that employ challenging listening tasks to determine just how good we are at these voice processing tasks. In Experiment 1, 54 university students were asked to distinguish between very similar sounding, yet different speakers (celebrity targets and their impersonators). Participants completed a 'Same/Different' task and a 'Which is the Celebrity?' task to pairs of speakers, and a 'Real or Not?' task to individual speakers. In Experiment 2, a separate group of 40 university students was asked to pair very different sounding instances of the same speakers (speaking and singing). Participants were presented with an array of voice clips and completed a 'Pairs Task' as a variant of the more traditional voice sorting task. The results of Experiment 1 suggested that significantly more mistakes were made when distinguishing celebrity targets from their impersonators than when distinguishing the same targets from control voices. Nevertheless, listeners were significantly better than chance in all three tasks despite the challenge. Similarly, the results of Experiment 2 suggested that it was significantly more difficult to pair singing and speaking clips than to pair two speaking clips, particularly when the speakers were unfamiliar. Again, however, the performance was significantly above zero, and was again better than chance in a cautious comparison. Taken together, the results suggest that vocal identity processing is a highly adaptable task, assisted by familiarity with the speaker. However, the fact that performance remained above chance in all tasks suggests that we had not reached the limit of our listeners' capability, despite the considerable listening challenges introduced. We conclude that voice processing is far better than previous research might have presumed.
Collapse
|
6
|
Lavan N. How do we describe other people from voices and faces? Cognition 2023; 230:105253. [PMID: 36215763 DOI: 10.1016/j.cognition.2022.105253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 07/29/2022] [Accepted: 08/06/2022] [Indexed: 11/07/2022]
Abstract
When seeing someone's face or hearing their voice, perceivers routinely infer information about a person's age, sex and social traits. While many experiments have explored how individual person characteristics are perceived in isolation, less is known about which person characteristics are described spontaneously from voices and faces and how descriptions may differ across modalities. In Experiment 1, participants provided free descriptions for voices and faces. These free descriptions followed similar patterns for voices and faces - and for individual identities: Participants spontaneously referred to a wide range of descriptors. Psychological descriptors, such as character traits, were used most frequently; physical characteristics, such as age and sex, were notable as they were mentioned earlier than other types of descriptors. After finding primarily similarities between modalities when analysing person descriptions across identities, Experiment 2 asked whether free descriptions encode how individual identities differ. For this purpose, the measures derived from the free descriptions were linked to voice/face discrimination judgements that are known to describe differences in perceptual properties between identity pairs. Significant relationships emerged within and across modalities, showing that free descriptions indeed encode differences between identities - information that is shared with discrimination judgements. This suggests that the two tasks tap into similar, high-level person representations. These findings show that free description data can offer valuable insights into person perception and underline that person perception is a multivariate process during which perceivers rapidly and spontaneously infer many different person characteristics to form a holistic impression of a person.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, United Kingdom.
| |
Collapse
|
7
|
Lavan N, Collins MRN, Miah JFM. Audiovisual identity perception from naturally-varying stimuli is driven by visual information. Br J Psychol 2021; 113:248-263. [PMID: 34490897 DOI: 10.1111/bjop.12531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/19/2021] [Indexed: 11/30/2022]
Abstract
Identity perception often takes place in multimodal settings, where perceivers have access to both visual (face) and auditory (voice) information. Despite this, identity perception is usually studied in unimodal contexts, where face and voice identity perception are modelled independently from one another. In this study, we asked whether and how much auditory and visual information contribute to audiovisual identity perception from naturally-varying stimuli. In a between-subjects design, participants completed an identity sorting task with either dynamic video-only, audio-only or dynamic audiovisual stimuli. In this task, participants were asked to sort multiple, naturally-varying stimuli from three different people by perceived identity. We found that identity perception was more accurate for video-only and audiovisual stimuli compared with audio-only stimuli. Interestingly, there was no difference in accuracy between video-only and audiovisual stimuli. Auditory information nonetheless played a role alongside visual information as audiovisual identity judgements per stimulus could be predicted from both auditory and visual identity judgements, respectively. While the relationship was stronger for visual information and audiovisual information, auditory information still uniquely explained a significant portion of the variance in audiovisual identity judgements. Our findings thus align with previous theoretical and empirical work that proposes that, compared with faces, voices are an important but relatively less salient and a weaker cue to identity perception. We expand on this work to show that, at least in the context of this study, having access to voices in addition to faces does not result in better identity perception accuracy.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| | - Madeleine Rose Niamh Collins
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| | - Jannatul Firdaus Monisha Miah
- Department of Biological and Experimental Psychology, School of Biological and Chemical Sciences, Queen Mary University of London, UK
| |
Collapse
|
8
|
Unimodal and cross-modal identity judgements using an audio-visual sorting task: Evidence for independent processing of faces and voices. Mem Cognit 2021; 50:216-231. [PMID: 34254274 PMCID: PMC8763756 DOI: 10.3758/s13421-021-01198-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/02/2021] [Indexed: 11/18/2022]
Abstract
Unimodal and cross-modal information provided by faces and voices contribute to identity percepts. To examine how these sources of information interact, we devised a novel audio-visual sorting task in which participants were required to group video-only and audio-only clips into two identities. In a series of three experiments, we show that unimodal face and voice sorting were more accurate than cross-modal sorting: While face sorting was consistently most accurate followed by voice sorting, cross-modal sorting was at chancel level or below. In Experiment 1, we compared performance in our novel audio-visual sorting task to a traditional identity matching task, showing that unimodal and cross-modal identity perception were overall moderately more accurate than the traditional identity matching task. In Experiment 2, separating unimodal from cross-modal sorting led to small improvements in accuracy for unimodal sorting, but no change in cross-modal sorting performance. In Experiment 3, we explored the effect of minimal audio-visual training: Participants were shown a clip of the two identities in conversation prior to completing the sorting task. This led to small, nonsignificant improvements in accuracy for unimodal and cross-modal sorting. Our results indicate that unfamiliar face and voice perception operate relatively independently with no evidence of mutual benefit, suggesting that extracting reliable cross-modal identity information is challenging.
Collapse
|
9
|
The processing of intimately familiar and unfamiliar voices: Specific neural responses of speaker recognition and identification. PLoS One 2021; 16:e0250214. [PMID: 33861789 PMCID: PMC8051806 DOI: 10.1371/journal.pone.0250214] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 04/03/2021] [Indexed: 11/19/2022] Open
Abstract
Research has repeatedly shown that familiar and unfamiliar voices elicit different neural responses. But it has also been suggested that different neural correlates associate with the feeling of having heard a voice and knowing who the voice represents. The terminology used to designate these varying responses remains vague, creating a degree of confusion in the literature. Additionally, terms serving to designate tasks of voice discrimination, voice recognition, and speaker identification are often inconsistent creating further ambiguities. The present study used event-related potentials (ERPs) to clarify the difference between responses to 1) unknown voices, 2) trained-to-familiar voices as speech stimuli are repeatedly presented, and 3) intimately familiar voices. In an experiment, 13 participants listened to repeated utterances recorded from 12 speakers. Only one of the 12 voices was intimately familiar to a participant, whereas the remaining 11 voices were unfamiliar. The frequency of presentation of these 11 unfamiliar voices varied with only one being frequently presented (the trained-to-familiar voice). ERP analyses revealed different responses for intimately familiar and unfamiliar voices in two distinct time windows (P2 between 200-250 ms and a late positive component, LPC, between 450-850 ms post-onset) with late responses occurring only for intimately familiar voices. The LPC present sustained shifts, and short-time ERP components appear to reflect an early recognition stage. The trained voice equally elicited distinct responses, compared to rarely heard voices, but these occurred in a third time window (N250 between 300-350 ms post-onset). Overall, the timing of responses suggests that the processing of intimately familiar voices operates in two distinct steps of voice recognition, marked by a P2 on right centro-frontal sites, and speaker identification marked by an LPC component. The recognition of frequently heard voices entails an independent recognition process marked by a differential N250. Based on the present results and previous observations, it is proposed that there is a need to distinguish between processes of voice "recognition" and "identification". The present study also specifies test conditions serving to reveal this distinction in neural responses, one of which bears on the length of speech stimuli given the late responses associated with voice identification.
Collapse
|
10
|
Explaining face-voice matching decisions: The contribution of mouth movements, stimulus effects and response biases. Atten Percept Psychophys 2021; 83:2205-2216. [PMID: 33797024 PMCID: PMC8213568 DOI: 10.3758/s13414-021-02290-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2021] [Indexed: 11/08/2022]
Abstract
Previous studies have shown that face-voice matching accuracy is more consistently above chance for dynamic (i.e. speaking) faces than for static faces. This suggests that dynamic information can play an important role in informing matching decisions. We initially asked whether this advantage for dynamic stimuli is due to shared information across modalities that is encoded in articulatory mouth movements. Participants completed a sequential face-voice matching task with (1) static images of faces, (2) dynamic videos of faces, (3) dynamic videos where only the mouth was visible, and (4) dynamic videos where the mouth was occluded, in a well-controlled stimulus set. Surprisingly, after accounting for random variation in the data due to design choices, accuracy for all four conditions was at chance. Crucially, however, exploratory analyses revealed that participants were not responding randomly, with different patterns of response biases being apparent for different conditions. Our findings suggest that face-voice identity matching may not be possible with above-chance accuracy but that analyses of response biases can shed light upon how people attempt face-voice matching. We discuss these findings with reference to the differential functional roles for faces and voices recently proposed for multimodal person perception.
Collapse
|
11
|
Tsantani M, Cook R. Normal recognition of famous voices in developmental prosopagnosia. Sci Rep 2020; 10:19757. [PMID: 33184411 PMCID: PMC7661722 DOI: 10.1038/s41598-020-76819-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 11/03/2020] [Indexed: 02/06/2023] Open
Abstract
Developmental prosopagnosia (DP) is a condition characterised by lifelong face recognition difficulties. Recent neuroimaging findings suggest that DP may be associated with aberrant structure and function in multimodal regions of cortex implicated in the processing of both facial and vocal identity. These findings suggest that both facial and vocal recognition may be impaired in DP. To test this possibility, we compared the performance of 22 DPs and a group of typical controls, on closely matched tasks that assessed famous face and famous voice recognition ability. As expected, the DPs showed severe impairment on the face recognition task, relative to typical controls. In contrast, however, the DPs and controls identified a similar number of voices. Despite evidence of interactions between facial and vocal processing, these findings suggest some degree of dissociation between the two processing pathways, whereby one can be impaired while the other develops typically. A possible explanation for this dissociation in DP could be that the deficit originates in the early perceptual encoding of face structure, rather than at later, post-perceptual stages of face identity processing, which may be more likely to involve interactions with other modalities.
Collapse
Affiliation(s)
- Maria Tsantani
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK
| | - Richard Cook
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|
12
|
May I Speak Freely? The Difficulty in Vocal Identity Processing Across Free and Scripted Speech. JOURNAL OF NONVERBAL BEHAVIOR 2020. [DOI: 10.1007/s10919-020-00348-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractIn the fields of face recognition and voice recognition, a growing literature now suggests that the ability to recognize an individual despite changes from one instance to the next is a considerable challenge. The present paper reports on one experiment in the voice domain designed to determine whether a change in the mere style of speech may result in a measurable difficulty when trying to discriminate between speakers. Participants completed a speaker discrimination task to pairs of speech clips, which represented either free speech or scripted speech segments. The results suggested that speaker discrimination was significantly better when the style of speech did not change compared to when it did change, and was significantly better from scripted than from free speech segments. These results support the emergent body of evidence suggesting that within-identity variability is a challenge, and the forensic implications of such a mild change in speech style are discussed.
Collapse
|
13
|
Huestegge SM, Raettig T. Crossing Gender Borders: Bidirectional Dynamic Interaction Between Face-Based and Voice-Based Gender Categorization. J Voice 2020; 34:487.e1-487.e9. [DOI: 10.1016/j.jvoice.2018.09.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 09/25/2018] [Accepted: 09/25/2018] [Indexed: 10/28/2022]
|
14
|
Lavan N, Knight S, Hazan V, McGettigan C. The effects of high variability training on voice identity learning. Cognition 2019; 193:104026. [PMID: 31323377 DOI: 10.1016/j.cognition.2019.104026] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 05/16/2019] [Accepted: 07/08/2019] [Indexed: 10/26/2022]
Abstract
High variability training has been shown to benefit the learning of new face identities. In three experiments, we investigated whether this is also the case for voice identity learning. In Experiment 1a, we contrasted high variability training sets - which included stimuli extracted from a number of different recording sessions, speaking environments and speaking styles - with low variability stimulus sets that only included a single speaking style (read speech) extracted from one recording session (see Ritchie & Burton, 2017 for faces). Listeners were tested on an old/new recognition task using read sentences (i.e. test materials fully overlapped with the low variability training stimuli) and we found a high variability disadvantage. In Experiment 1b, listeners were trained in a similar way, however, now there was no overlap in speaking style or recording session between training sets and test stimuli. Here, we found a high variability advantage. In Experiment 2, variability was manipulated in terms of the number of unique items as opposed to number of unique speaking styles. Here, we contrasted the high variability training sets used in Experiment 1a with low variability training sets that included the same breadth of styles, but fewer unique items; instead, individual items were repeated (see Murphy, Ipser, Gaigg, & Cook, 2015 for faces). We found only weak evidence for a high variability advantage, which could be explained by stimulus-specific effects. We propose that high variability advantages may be particularly pronounced when listeners are required to generalise from trained stimuli to different-sounding, previously unheard stimuli. We discuss these findings in the context of mechanisms thought to underpin advantages for high variability training.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Speech, Hearing and Phonetic Sciences, University College London, United Kingdom; Department of Psychology, Royal Holloway, University of London, United Kingdom.
| | - Sarah Knight
- Department of Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| | - Valerie Hazan
- Department of Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, United Kingdom; Department of Psychology, Royal Holloway, University of London, United Kingdom.
| |
Collapse
|
15
|
Hou J, Ye Z. Sex Differences in Facial and Vocal Attractiveness Among College Students in China. Front Psychol 2019; 10:1166. [PMID: 31178792 PMCID: PMC6538682 DOI: 10.3389/fpsyg.2019.01166] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Accepted: 05/02/2019] [Indexed: 12/02/2022] Open
Abstract
This study aims to investigate sex differences in ratings for facial attractiveness (FA) and vocal attractiveness (VA). Participants (60 undergraduates in Study 1 and 111 undergraduates in Study 2) rated the attractiveness of computerized face images and voice recordings of men and women. In Study 1, face images and voice recordings were presented separately. Results indicated that men generally rated voice recordings of women more attractive than those of men, whereas women did not show different attractiveness ratings for voices of men vs. women. In Study 2, face images and voice recordings were paired as multimodal stimuli and presented simultaneously. Results indicated that men rated multimodal stimuli of women as more attractive than those of men, whereas women did not differentiate multimodal stimuli of men vs. women. We found that, compared to VA, FA had a stronger influence on participants' overall evaluations. Finally, we tested the difference between "original multimodal stimuli" (OMS) and "non-original multimodal stimuli" (non-OMS) and found the "OMS-facilitating effect." Taken together, findings indicated some sex differences in FA and VA in the current study, which could be used to interpret behaviors of sexual selection, human mate preferences, and designs and popularization of sex robots.
Collapse
Affiliation(s)
| | - Zi Ye
- Department of Philosophy, Anhui University, Hefei, China
| |
Collapse
|
16
|
Lavan N, Burston LF, Ladwa P, Merriman SE, Knight S, McGettigan C. Breaking voice identity perception: Expressive voices are more confusable for listeners. Q J Exp Psychol (Hove) 2019; 72:2240-2248. [PMID: 30808271 DOI: 10.1177/1747021819836890] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The human voice is a highly flexible instrument for self-expression, yet voice identity perception is largely studied using controlled speech recordings. Using two voice-sorting tasks with naturally varying stimuli, we compared the performance of listeners who were familiar and unfamiliar with the TV show Breaking Bad. Listeners organised audio clips of speech with (1) low-expressiveness and (2) high-expressiveness into perceived identities. We predicted that increased expressiveness (e.g., shouting, strained voice) would significantly impair performance. Overall, while unfamiliar listeners were less able to generalise identity across exemplars, the two groups performed equivalently well when telling voices apart when dealing with low-expressiveness stimuli. However, high vocal expressiveness significantly impaired telling apart in both the groups: this led to increased misidentifications, where sounds from one character were assigned to the other. These misidentifications were highly consistent for familiar listeners but less consistent for unfamiliar listeners. Our data suggest that vocal flexibility has powerful effects on identity perception, where changes in the acoustic properties of vocal signals introduced by expressiveness lead to effects apparent in familiar and unfamiliar listeners alike. At the same time, expressiveness appears to have affected other aspects of voice identity processing selectively in one listener group but not the other, thus revealing complex interactions of stimulus properties and listener characteristics (i.e., familiarity) in identity processing.
Collapse
Affiliation(s)
- Nadine Lavan
- 1 Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK.,2 Department of Psychology, Royal Holloway, University of London, London, UK
| | - Luke Fk Burston
- 2 Department of Psychology, Royal Holloway, University of London, London, UK
| | - Paayal Ladwa
- 2 Department of Psychology, Royal Holloway, University of London, London, UK
| | - Siobhan E Merriman
- 2 Department of Psychology, Royal Holloway, University of London, London, UK
| | - Sarah Knight
- 1 Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- 1 Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK.,2 Department of Psychology, Royal Holloway, University of London, London, UK
| |
Collapse
|
17
|
Smith HM, Baguley TS, Robson J, Dunn AK, Stacey PC. Forensic voice discrimination by lay listeners: The effect of speech type and background noise on performance. APPLIED COGNITIVE PSYCHOLOGY 2018. [DOI: 10.1002/acp.3478] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
| | - Thom S. Baguley
- Department of Psychology; Nottingham Trent University; Nottingham UK
| | - Jeremy Robson
- Nottingham Law School; Nottingham Trent University; Nottingham UK
| | - Andrew K. Dunn
- Department of Psychology; Nottingham Trent University; Nottingham UK
| | - Paula C. Stacey
- Department of Psychology; Nottingham Trent University; Nottingham UK
| |
Collapse
|
18
|
Maguinness C, Roswandowitz C, von Kriegstein K. Understanding the mechanisms of familiar voice-identity recognition in the human brain. Neuropsychologia 2018; 116:179-193. [DOI: 10.1016/j.neuropsychologia.2018.03.039] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Revised: 03/28/2018] [Accepted: 03/29/2018] [Indexed: 11/26/2022]
|
19
|
Lavan N, Short B, Wilding A, McGettigan C. Impoverished encoding of speaker identity in spontaneous laughter. EVOL HUM BEHAV 2018. [DOI: 10.1016/j.evolhumbehav.2017.11.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
20
|
Stevenage SV. Drawing a distinction between familiar and unfamiliar voice processing: A review of neuropsychological, clinical and empirical findings. Neuropsychologia 2017; 116:162-178. [PMID: 28694095 DOI: 10.1016/j.neuropsychologia.2017.07.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Revised: 06/04/2017] [Accepted: 07/07/2017] [Indexed: 11/29/2022]
Abstract
Thirty years on from their initial observation that familiar voice recognition is not the same as unfamiliar voice discrimination (van Lancker and Kreiman, 1987), the current paper reviews available evidence in support of a distinction between familiar and unfamiliar voice processing. Here, an extensive review of the literature is provided, drawing on evidence from four domains of interest: the neuropsychological study of healthy individuals, neuropsychological investigation of brain-damaged individuals, the exploration of voice recognition deficits in less commonly studied clinical conditions, and finally empirical data from healthy individuals. All evidence is assessed in terms of its contribution to the question of interest - is familiar voice processing distinct from unfamiliar voice processing. In this regard, the evidence provides compelling support for van Lancker and Kreiman's early observation. Two considerations result: First, the limits of research based on one or other type of voice stimulus are more clearly appreciated. Second, given the demonstration of a distinction between unfamiliar and familiar voice processing, a new wave of research is encouraged which examines the transition involved as a voice is learned.
Collapse
Affiliation(s)
- Sarah V Stevenage
- Department of Psychology, University of Southampton, Highfield, Southampton, Hampshire SO17 1BJ, UK.
| |
Collapse
|
21
|
Awwad Shiekh Hasan B, Valdes-Sosa M, Gross J, Belin P. "Hearing faces and seeing voices": Amodal coding of person identity in the human brain. Sci Rep 2016; 6:37494. [PMID: 27881866 PMCID: PMC5121604 DOI: 10.1038/srep37494] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Accepted: 10/27/2016] [Indexed: 11/09/2022] Open
Abstract
Recognizing familiar individuals is achieved by the brain by combining cues from several sensory modalities, including the face of a person and her voice. Here we used functional magnetic resonance (fMRI) and a whole-brain, searchlight multi-voxel pattern analysis (MVPA) to search for areas in which local fMRI patterns could result in identity classification as a function of sensory modality. We found several areas supporting face or voice stimulus classification based on fMRI responses, consistent with previous reports; the classification maps overlapped across modalities in a single area of right posterior superior temporal sulcus (pSTS). Remarkably, we also found several cortical areas, mostly located along the middle temporal gyrus, in which local fMRI patterns resulted in identity “cross-classification”: vocal identity could be classified based on fMRI responses to the faces, or the reverse, or both. These findings are suggestive of a series of cortical identity representations increasingly abstracted from the input modality.
Collapse
Affiliation(s)
- Bashar Awwad Shiekh Hasan
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom.,Institute of Neuroscience, Newcastle University, Newcastle, United Kingdom
| | | | - Joachim Gross
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Pascal Belin
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom.,Département de Psychologie, Université de Montréal, Montréal, Québec, Canada.,Institut de Neurosciecnes de la Timone, UMR 7289, CNRS and Aix-Marseille Université, Marseille, France
| |
Collapse
|
22
|
Tomlin RJ, Stevenage SV, Hammond S. Putting the pieces together: Revealing face–voice integration through the facial overshadowing effect. VISUAL COGNITION 2016. [DOI: 10.1080/13506285.2016.1245230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Rebecca J. Tomlin
- Department of Psychology, University of Southampton, Southampton, UK
| | | | - Sarah Hammond
- Department of Psychology, University of Southampton, Southampton, UK
| |
Collapse
|
23
|
Smith HMJ, Dunn AK, Baguley T, Stacey PC. The effect of inserting an inter-stimulus interval in face-voice matching tasks. Q J Exp Psychol (Hove) 2016; 71:424-434. [PMID: 27784196 DOI: 10.1080/17470218.2016.1253758] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Voices and static faces can be matched for identity above chance level. No previous face-voice matching experiments have included an inter-stimulus interval (ISI) exceeding 1 s. We tested whether accurate identity decisions rely on high-quality perceptual representations temporarily stored in sensory memory, and therefore whether the ability to make accurate matching decisions diminishes as the ISI increases. In each trial, participants had to decide whether an unfamiliar face and voice belonged to the same person. The face and voice stimuli were presented simultaneously in Experiment 1, and there was a 5-s ISI in Experiment 2, and a 10-s interval in Experiment 3. The results, analysed using multilevel modelling, revealed that static face-voice matching was significantly above chance level only when the stimuli were presented simultaneously (Experiment 1). The overall bias to respond same identity weakened as the interval increased, suggesting that this bias is explained by temporal contiguity. Taken together, the findings highlight that face-voice matching performance is reliant on comparing fast-decaying, high-quality perceptual representations. The results are discussed in terms of social functioning.
Collapse
Affiliation(s)
| | - Andrew K Dunn
- Psychology Division, Nottingham Trent University, Nottingham, UK
| | - Thom Baguley
- Psychology Division, Nottingham Trent University, Nottingham, UK
| | - Paula C Stacey
- Psychology Division, Nottingham Trent University, Nottingham, UK
| |
Collapse
|
24
|
Smith HMJ, Dunn AK, Baguley T, Stacey PC. Concordant Cues in Faces and Voices. EVOLUTIONARY PSYCHOLOGY 2016; 14:1474704916630317. [PMCID: PMC10481076 DOI: 10.1177/1474704916630317] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 11/14/2015] [Indexed: 09/29/2023] Open
Abstract
Information from faces and voices combines to provide multimodal signals about a person. Faces and voices may offer redundant, overlapping (backup signals), or complementary information (multiple messages). This article reports two experiments which investigated the extent to which faces and voices deliver concordant information about dimensions of fitness and quality. In Experiment 1, participants rated faces and voices on scales for masculinity/femininity, age, health, height, and weight. The results showed that people make similar judgments from faces and voices, with particularly strong correlations for masculinity/femininity, health, and height. If, as these results suggest, faces and voices constitute backup signals for various dimensions, it is hypothetically possible that people would be able to accurately match novel faces and voices for identity. However, previous investigations into novel face–voice matching offer contradictory results. In Experiment 2, participants saw a face and heard a voice and were required to decide whether the face and voice belonged to the same person. Matching accuracy was significantly above chance level, suggesting that judgments made independently from faces and voices are sufficiently similar that people can match the two. Both sets of results were analyzed using multilevel modeling and are interpreted as being consistent with the backup signal hypothesis.
Collapse
Affiliation(s)
| | - Andrew K. Dunn
- Psychology Division, Nottingham Trent University, Nottingham, UK
| | - Thom Baguley
- Psychology Division, Nottingham Trent University, Nottingham, UK
| | - Paula C. Stacey
- Psychology Division, Nottingham Trent University, Nottingham, UK
| |
Collapse
|