1
|
Pinheiro AP, Aucouturier JJ, Kotz SA. Neural adaptation to changes in self-voice during puberty. Trends Neurosci 2024; 47:777-787. [PMID: 39214825 DOI: 10.1016/j.tins.2024.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 07/18/2024] [Accepted: 08/01/2024] [Indexed: 09/04/2024]
Abstract
The human voice is a potent social signal and a distinctive marker of individual identity. As individuals go through puberty, their voices undergo acoustic changes, setting them apart from others. In this article, we propose that hormonal fluctuations in conjunction with morphological vocal tract changes during puberty establish a sensitive developmental phase that affects the monitoring of the adolescent voice and, specifically, self-other distinction. Furthermore, the protracted maturation of brain regions responsible for voice processing, coupled with the dynamically evolving social environment of adolescents, likely disrupts a clear differentiation of the self-voice from others' voices. This socioneuroendocrine framework offers a holistic understanding of voice monitoring during adolescence.
Collapse
Affiliation(s)
- Ana P Pinheiro
- Faculdade de Psicologia, Universidade de Lisboa, Alameda da Universidade, 1649-013 Lisboa, Portugal.
| | | | - Sonja A Kotz
- Maastricht University, Maastricht, The Netherlands; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
2
|
Harford EE, Holt LL, Abel TJ. Unveiling the development of human voice perception: Neurobiological mechanisms and pathophysiology. CURRENT RESEARCH IN NEUROBIOLOGY 2024; 6:100127. [PMID: 38511174 PMCID: PMC10950757 DOI: 10.1016/j.crneur.2024.100127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 02/22/2024] [Accepted: 02/26/2024] [Indexed: 03/22/2024] Open
Abstract
The human voice is a critical stimulus for the auditory system that promotes social connection, informs the listener about identity and emotion, and acts as the carrier for spoken language. Research on voice processing in adults has informed our understanding of the unique status of the human voice in the mature auditory cortex and provided potential explanations for mechanisms that underly voice selectivity and identity processing. There is evidence that voice perception undergoes developmental change starting in infancy and extending through early adolescence. While even young infants recognize the voice of their mother, there is an apparent protracted course of development to reach adult-like selectivity for human voice over other sound categories and recognition of other talkers by voice. Gaps in the literature do not allow for an exact mapping of this trajectory or an adequate description of how voice processing and its neural underpinnings abilities evolve. This review provides a comprehensive account of developmental voice processing research published to date and discusses how this evidence fits with and contributes to current theoretical models proposed in the adult literature. We discuss how factors such as cognitive development, neural plasticity, perceptual narrowing, and language acquisition may contribute to the development of voice processing and its investigation in children. We also review evidence of voice processing abilities in premature birth, autism spectrum disorder, and phonagnosia to examine where and how deviations from the typical trajectory of development may manifest.
Collapse
Affiliation(s)
- Emily E. Harford
- Department of Neurological Surgery, University of Pittsburgh, USA
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, USA
| | - Taylor J. Abel
- Department of Neurological Surgery, University of Pittsburgh, USA
- Department of Bioengineering, University of Pittsburgh, USA
| |
Collapse
|
3
|
Kreiman J. Information conveyed by voice qualitya). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1264-1271. [PMID: 38345424 DOI: 10.1121/10.0024609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 01/09/2024] [Indexed: 02/15/2024]
Abstract
The problem of characterizing voice quality has long caused debate and frustration. The richness of the available descriptive vocabulary is overwhelming, but the density and complexity of the information voices convey lead some to conclude that language can never adequately specify what we hear. Others argue that terminology lacks an empirical basis, so that language-based scales are inadequate a priori. Efforts to provide meaningful instrumental characterizations have also had limited success. Such measures may capture sound patterns but cannot at present explain what characteristics, intentions, or identity listeners attribute to the speaker based on those patterns. However, some terms continually reappear across studies. These terms align with acoustic dimensions accounting for variance across speakers and languages and correlate with size and arousal across species. This suggests that labels for quality rest on a bedrock of biology: We have evolved to perceive voices in terms of size/arousal, and these factors structure both voice acoustics and descriptive language. Such linkages could help integrate studies of signals and their meaning, producing a truly interdisciplinary approach to the study of voice.
Collapse
Affiliation(s)
- Jody Kreiman
- Departments of Head and Neck Surgery and Linguistics, University of California, Los Angeles, Los Angeles, California 90095-1794, USA
| |
Collapse
|
4
|
Har-Shai Yahav P, Sharaabi A, Zion Golumbic E. The effect of voice familiarity on attention to speech in a cocktail party scenario. Cereb Cortex 2024; 34:bhad475. [PMID: 38142293 DOI: 10.1093/cercor/bhad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 12/25/2023] Open
Abstract
Selective attention to one speaker in multi-talker environments can be affected by the acoustic and semantic properties of speech. One highly ecological feature of speech that has the potential to assist in selective attention is voice familiarity. Here, we tested how voice familiarity interacts with selective attention by measuring the neural speech-tracking response to both target and non-target speech in a dichotic listening "Cocktail Party" paradigm. We measured Magnetoencephalography from n = 33 participants, presented with concurrent narratives in two different voices, and instructed to pay attention to one ear ("target") and ignore the other ("non-target"). Participants were familiarized with one of the voices during the week prior to the experiment, rendering this voice familiar to them. Using multivariate speech-tracking analysis we estimated the neural responses to both stimuli and replicate their well-established modulation by selective attention. Importantly, speech-tracking was also affected by voice familiarity, showing enhanced response for target speech and reduced response for non-target speech in the contra-lateral hemisphere, when these were in a familiar vs. an unfamiliar voice. These findings offer valuable insight into how voice familiarity, and by extension, auditory-semantics, interact with goal-driven attention, and facilitate perceptual organization and speech processing in noisy environments.
Collapse
Affiliation(s)
- Paz Har-Shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Aviya Sharaabi
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
5
|
Perepelytsia V, Dellwo V. Acoustic compression in Zoom audio does not compromise voice recognition performance. Sci Rep 2023; 13:18742. [PMID: 37907749 PMCID: PMC10618539 DOI: 10.1038/s41598-023-45971-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/26/2023] [Indexed: 11/02/2023] Open
Abstract
Human voice recognition over telephone channels typically yields lower accuracy when compared to audio recorded in a studio environment with higher quality. Here, we investigated the extent to which audio in video conferencing, subject to various lossy compression mechanisms, affects human voice recognition performance. Voice recognition performance was tested in an old-new recognition task under three audio conditions (telephone, Zoom, studio) across all matched (familiarization and test with same audio condition) and mismatched combinations (familiarization and test with different audio conditions). Participants were familiarized with female voices presented in either studio-quality (N = 22), Zoom-quality (N = 21), or telephone-quality (N = 20) stimuli. Subsequently, all listeners performed an identical voice recognition test containing a balanced stimulus set from all three conditions. Results revealed that voice recognition performance (d') in Zoom audio was not significantly different to studio audio but both in Zoom and studio audio listeners performed significantly better compared to telephone audio. This suggests that signal processing of the speech codec used by Zoom provides equally relevant information in terms of voice recognition compared to studio audio. Interestingly, listeners familiarized with voices via Zoom audio showed a trend towards a better recognition performance in the test (p = 0.056) compared to listeners familiarized with studio audio. We discuss future directions according to which a possible advantage of Zoom audio for voice recognition might be related to some of the speech coding mechanisms used by Zoom.
Collapse
Affiliation(s)
- Valeriia Perepelytsia
- Department of Computational Linguistics, University of Zurich, Andreasstrasse 15, 8050, Zurich, Switzerland.
| | - Volker Dellwo
- Department of Computational Linguistics, University of Zurich, Andreasstrasse 15, 8050, Zurich, Switzerland
| |
Collapse
|
6
|
Lavan N, McGettigan C. A model for person perception from familiar and unfamiliar voices. COMMUNICATIONS PSYCHOLOGY 2023; 1:1. [PMID: 38665246 PMCID: PMC11041786 DOI: 10.1038/s44271-023-00001-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 04/28/2023] [Indexed: 04/28/2024]
Abstract
When hearing a voice, listeners can form a detailed impression of the person behind the voice. Existing models of voice processing focus primarily on one aspect of person perception - identity recognition from familiar voices - but do not account for the perception of other person characteristics (e.g., sex, age, personality traits). Here, we present a broader perspective, proposing that listeners have a common perceptual goal of perceiving who they are hearing, whether the voice is familiar or unfamiliar. We outline and discuss a model - the Person Perception from Voices (PPV) model - that achieves this goal via a common mechanism of recognising a familiar person, persona, or set of speaker characteristics. Our PPV model aims to provide a more comprehensive account of how listeners perceive the person they are listening to, using an approach that incorporates and builds on aspects of the hierarchical frameworks and prototype-based mechanisms proposed within existing models of voice identity recognition.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Experimental and Biological Psychology, Queen Mary University of London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing, and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
7
|
Orepic P, Kannape OA, Faivre N, Blanke O. Bone conduction facilitates self-other voice discrimination. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221561. [PMID: 36816848 PMCID: PMC9929504 DOI: 10.1098/rsos.221561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 01/27/2023] [Indexed: 06/18/2023]
Abstract
One's own voice is one of the most important and most frequently heard voices. Although it is the sound we associate most with ourselves, it is perceived as strange when played back in a recording. One of the main reasons is the lack of bone conduction that is inevitably present when hearing one's own voice while speaking. The resulting discrepancy between experimental and natural self-voice stimuli has significantly impeded self-voice research, rendering it one of the least investigated aspects of self-consciousness. Accordingly, factors that contribute to self-voice perception remain largely unknown. In a series of three studies, we rectified this ecological discrepancy by augmenting experimental self-voice stimuli with bone-conducted vibrotactile stimulation that is present during natural self-voice perception. Combining voice morphing with psychophysics, we demonstrate that specifically self-other but not familiar-other voice discrimination improved for stimuli presented using bone as compared with air conduction. Furthermore, our data outline independent contributions of familiarity and acoustic processing to separating the own from another's voice: although vocal differences increased general voice discrimination, self-voices were more confused with familiar than unfamiliar voices, regardless of their acoustic similarity. Collectively, our findings show that concomitant vibrotactile stimulation improves auditory self-identification, thereby portraying self-voice as a fundamentally multi-modal construct.
Collapse
Affiliation(s)
- Pavo Orepic
- Laboratory of Cognitive Neuroscience, Neuro-X Institute and Brain Mind Institute, Faculty of Life Sciences, École polytechnique fédérale de Lausanne (EPFL), 1202 Geneva, Switzerland
| | - Oliver Alan Kannape
- Laboratory of Cognitive Neuroscience, Neuro-X Institute and Brain Mind Institute, Faculty of Life Sciences, École polytechnique fédérale de Lausanne (EPFL), 1202 Geneva, Switzerland
- Virtual Medicine Centre, NeuroCentre, University Hospital of Geneva, 1205 Geneva, Switzerland
| | - Nathan Faivre
- University Grenoble Alpes, University Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Olaf Blanke
- Laboratory of Cognitive Neuroscience, Neuro-X Institute and Brain Mind Institute, Faculty of Life Sciences, École polytechnique fédérale de Lausanne (EPFL), 1202 Geneva, Switzerland
- Department of Clinical Neurosciences, University Hospital of Geneva, 1205 Geneva, Switzerland
| |
Collapse
|
8
|
Hayes NA, Davidson LS, Uchanski RM. Considerations in pediatric device candidacy: An emphasis on spoken language. Cochlear Implants Int 2022; 23:300-308. [PMID: 35637623 PMCID: PMC9339525 DOI: 10.1080/14670100.2022.2079189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
As cochlear implant (CI) candidacy expands to consider children with more residual hearing, the use of a CI and a hearing aid (HA) at the non-implanted ear (bimodal devices) is increasing. This case study examines the contributions of acoustic and electric input to speech perception performance for a pediatric bimodal device user (S1) who is a borderline bilateral cochlear implant candidate. S1 completed a battery of perceptual tests in CI-only, HA-only and bimodal conditions. Since CIs and HAs differ in their ability to transmit cues related to segmental and suprasegmental perception, both types of perception were tested. Performance in all three device conditions were generally similar across tests, showing no clear device-condition benefit. Further, S1's spoken language performance was compared to those of a large group of children with prelingual severe-profound hearing loss who used two devices from a young age, at least one of which was a CI. S1's speech perception and language scores were average or above-average compared to these other pediatric CI recipients. Both segmental and suprasegmental speech perception, and spoken language skills should be examined to determine the broad-scale performance level of bimodal recipients, especially when deciding whether to move from bimodal devices to bilateral CIs.
Collapse
Affiliation(s)
- Natalie A Hayes
- Program in Audiology and Communication Science, Department of Otolaryngology, Washington University School of Medicine, St. Louis, MO, USA
| | - Lisa S Davidson
- Program in Audiology and Communication Science, Department of Otolaryngology, Washington University School of Medicine, St. Louis, MO, USA
| | - Rosalie M Uchanski
- Program in Audiology and Communication Science, Department of Otolaryngology, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
9
|
Colby S, Orena AJ. Recognizing Voices Through a Cochlear Implant: A Systematic Review of Voice Perception, Talker Discrimination, and Talker Identification. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3165-3194. [PMID: 35926089 PMCID: PMC9911123 DOI: 10.1044/2022_jslhr-21-00209] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 02/02/2022] [Accepted: 05/03/2022] [Indexed: 06/15/2023]
Abstract
OBJECTIVE Some cochlear implant (CI) users report having difficulty accessing indexical information in the speech signal, presumably due to limitations in the transmission of fine spectrotemporal cues. The purpose of this review article was to systematically review and evaluate the existing research on talker processing in CI users. Specifically, we reviewed the performance of CI users in three types of talker- and voice-related tasks. We also examined the different factors (such as participant, hearing, and device characteristics) that might influence performance in these specific tasks. DESIGN We completed a systematic search of the literature with select key words using citation aggregation software to search Google Scholar. We included primary reports that tested (a) talker discrimination, (b) voice perception, and (c) talker identification. Each report must have had at least one group of participants with CIs. Each included study was also evaluated for quality of evidence. RESULTS The searches resulted in 1,561 references, which were first screened for inclusion and then evaluated in full. Forty-three studies examining talker discrimination, voice perception, and talker identification were included in the final review. Most studies were focused on postlingually deafened and implanted adult CI users, with fewer studies focused on prelingual implant users. In general, CI users performed above chance in these tasks. When there was a difference between groups, CI users performed less accurately than their normal-hearing (NH) peers. A subset of CI users reached the same level of performance as NH participants exposed to noise-vocoded stimuli. Some studies found that CI users and NH participants relied on different cues for talker perception. Within groups of CI users, there is moderate evidence for a bimodal benefit for talker processing, and there are mixed findings about the effects of hearing experience. CONCLUSIONS The current review highlights the challenges faced by CI users in tracking and recognizing voices and how they adapt to it. Although large variability exists, there is evidence that CI users can process indexical information from speech, though with less accuracy than their NH peers. Recent work has described some of the factors that might ease the challenges of talker processing in CI users. We conclude by suggesting some future avenues of research to optimize real-world speech outcomes.
Collapse
Affiliation(s)
- Sarah Colby
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City
| | - Adriel John Orena
- Department of Psychology, University of British Columbia, Vancouver, Canada
| |
Collapse
|
10
|
Lee JJ, Perrachione TK. Implicit and explicit learning in talker identification. Atten Percept Psychophys 2022; 84:2002-2015. [PMID: 35534783 PMCID: PMC10081569 DOI: 10.3758/s13414-022-02500-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/23/2022] [Indexed: 11/08/2022]
Abstract
In the real world, listeners seem to implicitly learn talkers' vocal identities during interactions that prioritize attending to the content of talkers' speech. In contrast, most laboratory experiments of talker identification employ training paradigms that require listeners to explicitly practice identifying voices. Here, we investigated whether listeners become familiar with talkers' vocal identities during initial exposures that do not involve explicit talker identification. Participants were assigned to one of three exposure tasks, in which they heard identical stimuli but were differentially required to attend to the talkers' vocal identity or to the verbal content of their speech: (1) matching the talker to a concurrent visual cue (talker-matching); (2) discriminating whether the talker was the same as the prior trial (talker 1-back); or (3) discriminating whether speech content matched the previous trial (verbal 1-back). All participants were then tested on their ability to learn to identify talkers from novel speech content. Critically, we manipulated whether the talkers during this post-test differed from those heard during training. Compared to learning to identify novel talkers, listeners were significantly more accurate learning to identify the talkers they had previously been exposed to in the talker-matching and verbal 1-back tasks, but not the talker 1-back task. The correlation between talker identification test performance and exposure task performance was also greater when the talkers were the same in both tasks. These results suggest that listeners learn talkers' vocal identity implicitly during speech perception, even if they are not explicitly attending to the talkers' identity.
Collapse
Affiliation(s)
- Jayden J Lee
- Department of Speech, Language, & Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, & Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA.
| |
Collapse
|
11
|
The Time Course of Emotional Authenticity Detection in Nonverbal Vocalizations. Cortex 2022; 151:116-132. [DOI: 10.1016/j.cortex.2022.02.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 12/23/2021] [Accepted: 02/16/2022] [Indexed: 11/24/2022]
|
12
|
Costello J, Smith M. The BCH message banking process™, voice banking, and double-dipping™. Augment Altern Commun 2022; 37:241-250. [PMID: 35000518 DOI: 10.1080/07434618.2021.2021554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Abstract
Significant advances have been made in interventions to maintain communication and personhood for individuals with neurodegenerative conditions. One innovation is Message Banking, a clinical approach first developed at Boston Children's Hospital (BCH). This paper outlines the Message Banking process as implemented at BCH, which includes the option of "Double Dipping," where banked messages are mined to develop personalized synthesized voices. More than a decade of experience has led to the evolution of six core principles underpinning the BCH process, resulting in a structured introduction of the associated concepts and practices with people with amyotrophic lateral sclerosis (ALS) and their families. These principles highlight the importance of assigning ownership and control of the process to individuals with ALS and their families, ensuring that as a tool it is empowering and offers hope. Changes have been driven by feedback from individuals who have participated in the BCH process over many years. The success of the process has recently been extended through partnerships that allow the recorded messages to be used to develop individual personalized synthetic voices to complement banked messages. While the process of banking messages is technically relatively simple, the full value of the process should be underpinned by the values and principles outlined in this tutorial.
Collapse
Affiliation(s)
- John Costello
- Augmentative Communication Program and Jay S. Fishman ALS Augmentative Communication Program, Boston Children's Hospital, Adjunct Faculty Boston University, Boston, MA, USA
| | - Martine Smith
- Department of Clinical Speech and Language Studies, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
13
|
Abstract
The ability to identify individuals by voice is fundamental for communication. However, little is known about the expectations that infants hold when learning unfamiliar voices. Here, the voice-learning skills of 4- and 8-month-olds (N = 53; 29 girls, 14 boys of various ethnicities) were tested using a preferential-looking task that involved audiovisual stimuli of their mothers and other unfamiliar women. Findings reveal that the expectation that novel voices map on to novel faces emerges between 4 and 8 months of age, and that infants can retain learning of face-voice pairings via nonostensive cues by 8 months of age. This study provides new insights about infants' use of disambiguation and fast mapping in early voice learning.
Collapse
|
14
|
Maguinness C, von Kriegstein K. Visual mechanisms for voice-identity recognition flexibly adjust to auditory noise level. Hum Brain Mapp 2021; 42:3963-3982. [PMID: 34043249 PMCID: PMC8288083 DOI: 10.1002/hbm.25532] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 04/26/2021] [Accepted: 05/02/2021] [Indexed: 11/24/2022] Open
Abstract
Recognising the identity of voices is a key ingredient of communication. Visual mechanisms support this ability: recognition is better for voices previously learned with their corresponding face (compared to a control condition). This so‐called ‘face‐benefit’ is supported by the fusiform face area (FFA), a region sensitive to facial form and identity. Behavioural findings indicate that the face‐benefit increases in noisy listening conditions. The neural mechanisms for this increase are unknown. Here, using functional magnetic resonance imaging, we examined responses in face‐sensitive regions while participants recognised the identity of auditory‐only speakers (previously learned by face) in high (SNR −4 dB) and low (SNR +4 dB) levels of auditory noise. We observed a face‐benefit in both noise levels, for most participants (16 of 21). In high‐noise, the recognition of face‐learned speakers engaged the right posterior superior temporal sulcus motion‐sensitive face area (pSTS‐mFA), a region implicated in the processing of dynamic facial cues. The face‐benefit in high‐noise also correlated positively with increased functional connectivity between this region and voice‐sensitive regions in the temporal lobe in the group of 16 participants with a behavioural face‐benefit. In low‐noise, the face‐benefit was robustly associated with increased responses in the FFA and to a lesser extent the right pSTS‐mFA. The findings highlight the remarkably adaptive nature of the visual network supporting voice‐identity recognition in auditory‐only listening conditions.
Collapse
Affiliation(s)
- Corrina Maguinness
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
15
|
The processing of intimately familiar and unfamiliar voices: Specific neural responses of speaker recognition and identification. PLoS One 2021; 16:e0250214. [PMID: 33861789 PMCID: PMC8051806 DOI: 10.1371/journal.pone.0250214] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 04/03/2021] [Indexed: 11/19/2022] Open
Abstract
Research has repeatedly shown that familiar and unfamiliar voices elicit different neural responses. But it has also been suggested that different neural correlates associate with the feeling of having heard a voice and knowing who the voice represents. The terminology used to designate these varying responses remains vague, creating a degree of confusion in the literature. Additionally, terms serving to designate tasks of voice discrimination, voice recognition, and speaker identification are often inconsistent creating further ambiguities. The present study used event-related potentials (ERPs) to clarify the difference between responses to 1) unknown voices, 2) trained-to-familiar voices as speech stimuli are repeatedly presented, and 3) intimately familiar voices. In an experiment, 13 participants listened to repeated utterances recorded from 12 speakers. Only one of the 12 voices was intimately familiar to a participant, whereas the remaining 11 voices were unfamiliar. The frequency of presentation of these 11 unfamiliar voices varied with only one being frequently presented (the trained-to-familiar voice). ERP analyses revealed different responses for intimately familiar and unfamiliar voices in two distinct time windows (P2 between 200-250 ms and a late positive component, LPC, between 450-850 ms post-onset) with late responses occurring only for intimately familiar voices. The LPC present sustained shifts, and short-time ERP components appear to reflect an early recognition stage. The trained voice equally elicited distinct responses, compared to rarely heard voices, but these occurred in a third time window (N250 between 300-350 ms post-onset). Overall, the timing of responses suggests that the processing of intimately familiar voices operates in two distinct steps of voice recognition, marked by a P2 on right centro-frontal sites, and speaker identification marked by an LPC component. The recognition of frequently heard voices entails an independent recognition process marked by a differential N250. Based on the present results and previous observations, it is proposed that there is a need to distinguish between processes of voice "recognition" and "identification". The present study also specifies test conditions serving to reveal this distinction in neural responses, one of which bears on the length of speech stimuli given the late responses associated with voice identification.
Collapse
|
16
|
Ohgami Y, Kotani Y, Yoshida N, Kunimatsu A, Kiryu S, Inoue Y. Voice, rhythm, and beep stimuli differently affect the right hemisphere preponderance and components of stimulus-preceding negativity. Biol Psychol 2021; 160:108048. [PMID: 33596460 DOI: 10.1016/j.biopsycho.2021.108048] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 02/08/2021] [Accepted: 02/08/2021] [Indexed: 12/30/2022]
Abstract
The present study investigated whether auditory stimuli with different contents affect right laterality and the components of stimulus-preceding negativity (SPN). A time-estimation task was performed under voice, rhythm, beep, and control conditions. The SPN interval during which participants anticipated the stimulus was divided into quarters to define early and late SPNs. Early and late components of SPN were also extracted using a principal component analysis. The anticipation of voice sounds enhanced the early SPN and the early component, which reflected the anticipation of language processing. Beep sounds elicited the right hemisphere preponderance of the early component, the early SPN, and the late SPN. The rhythmic sound tended to attenuate the amplitude compared with the two other stimuli. These findings further substantiate the existence of separate early and late components of the SPN. In addition, they suggest that the early component reflects selective anticipatory attention toward differing types of auditory feedback.
Collapse
Affiliation(s)
- Yoshimi Ohgami
- Institute for Liberal Arts, Tokyo Institute of Technology, 2-12-1 Ohokayama, Meguro, Tokyo, Japan.
| | - Yasunori Kotani
- Institute for Liberal Arts, Tokyo Institute of Technology, 2-12-1 Ohokayama, Meguro, Tokyo, Japan
| | - Nobukiyo Yoshida
- Department of Radiology, Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato, Tokyo, Japan
| | - Akira Kunimatsu
- Department of Radiology, Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato, Tokyo, Japan
| | - Shigeru Kiryu
- Department of Medicine, International University of Health and Welfare, 4-3 Kozunomori, Narita, Chiba, Japan
| | - Yusuke Inoue
- Department of Diagnostic Radiology, Kitasato University, 1-15-1 Kitasato, Minami, Sagamihara, Kanagawa, Japan
| |
Collapse
|
17
|
Davidson LS, Geers AE, Uchanski RM, Firszt JB. Effects of Early Acoustic Hearing on Speech Perception and Language for Pediatric Cochlear Implant Recipients. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:3620-3637. [PMID: 31518517 PMCID: PMC6808345 DOI: 10.1044/2019_jslhr-h-18-0255] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/14/2018] [Accepted: 06/19/2019] [Indexed: 06/10/2023]
Abstract
Purpose The overall goal of the current study was to identify an optimal level and duration of acoustic experience that facilitates language development for pediatric cochlear implant (CI) recipients-specifically, to determine whether there is an optimal duration of hearing aid (HA) use and unaided threshold levels that should be considered before proceeding to bilateral CIs. Method A total of 117 pediatric CI recipients (ages 5-9 years) were given speech perception and standardized tests of receptive vocabulary and language. The speech perception battery included tests of segmental perception (e.g., word recognition in quiet and noise, and vowels and consonants in quiet) and of suprasegmental perception (e.g., talker and stress discrimination, and emotion identification). Hierarchical regression analyses were used to determine the effects of speech perception on language scores, and the effects of residual hearing level (unaided pure-tone average [PTA]) and duration of HA use on speech perception. Results A continuum of residual hearing levels and the length of HA use were represented by calculating the unaided PTA of the ear with the longest duration of HA use for each child. All children wore 2 devices: Some wore bimodal devices, while others received their 2nd CI either simultaneously or sequentially, representing a wide range of HA use (0.03-9.05 years). Regression analyses indicate that suprasegmental perception contributes unique variance to receptive language scores and that both segmental and suprasegmental skills each contribute independently to receptive vocabulary scores. Also, analyses revealed an optimal duration of HA use for each of 3 ranges of hearing loss severity (with mean PTAs of 73, 92, and 111 dB HL) that maximizes suprasegmental perception. Conclusions For children with the most profound losses, early bilateral CIs provide the greatest opportunity for developing good spoken language skills. For those with moderate-to-severe losses, however, a prescribed period of bimodal use may be more advantageous for developing good spoken language skills.
Collapse
Affiliation(s)
| | | | | | - Jill B. Firszt
- Washington University School of Medicine in St. Louis, MO
| |
Collapse
|
18
|
Engelberg JWM, Schwartz JW, Gouzoules H. Do human screams permit individual recognition? PeerJ 2019; 7:e7087. [PMID: 31275746 PMCID: PMC6596410 DOI: 10.7717/peerj.7087] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 05/07/2019] [Indexed: 11/20/2022] Open
Abstract
The recognition of individuals through vocalizations is a highly adaptive ability in the social behavior of many species, including humans. However, the extent to which nonlinguistic vocalizations such as screams permit individual recognition in humans remains unclear. Using a same-different vocalizer discrimination task, we investigated participants' ability to correctly identify whether pairs of screams were produced by the same person or two different people, a critical prerequisite to individual recognition. Despite prior theory-based contentions that screams are not acoustically well-suited to conveying identity cues, listeners discriminated individuals at above-chance levels by their screams, including both acoustically modified and unmodified exemplars. We found that vocalizer gender explained some variation in participants' discrimination abilities and response times, but participant attributes (gender, experience, empathy) did not. Our findings are consistent with abundant evidence from nonhuman primates, suggesting that both human and nonhuman screams convey cues to caller identity, thus supporting the thesis of evolutionary continuity in at least some aspects of scream function across primate species.
Collapse
Affiliation(s)
| | - Jay W Schwartz
- Department of Psychology, Emory University, Atlanta, GA, USA
| | | |
Collapse
|
19
|
Abstract
Human voices are extremely variable: The same person can sound very different depending on whether they are speaking, laughing, shouting or whispering. In order to successfully recognise someone from their voice, a listener needs to be able to generalize across these different vocal signals (‘telling people together’). However, in most studies of voice-identity processing to date, the substantial within-person variability has been eliminated through the use of highly controlled stimuli, thus focussing on how we tell people apart. We argue that this obscures our understanding of voice-identity processing by controlling away an essential feature of vocal stimuli that may include diagnostic information. In this paper, we propose that we need to extend the focus of voice-identity research to account for both “telling people together” as well as “telling people apart.” That is, we must account for whether, and to what extent, listeners can overcome within-person variability to obtain a stable percept of person identity from vocal cues. To do this, our theoretical and methodological frameworks need to be adjusted to explicitly include the study of within-person variability.
Collapse
|
20
|
Lavan N, Burston LFK, Garrido L. How many voices did you hear? Natural variability disrupts identity perception from unfamiliar voices. Br J Psychol 2018; 110:576-593. [PMID: 30221374 PMCID: PMC6767376 DOI: 10.1111/bjop.12348] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 08/12/2018] [Indexed: 11/30/2022]
Abstract
Our voices sound different depending on the context (laughing vs. talking to a child vs. giving a speech), making within‐person variability an inherent feature of human voices. When perceiving speaker identities, listeners therefore need to not only ‘tell people apart’ (perceiving exemplars from two different speakers as separate identities) but also ‘tell people together’ (perceiving different exemplars from the same speaker as a single identity). In the current study, we investigated how such natural within‐person variability affects voice identity perception. Using voices from a popular TV show, listeners, who were either familiar or unfamiliar with this show, sorted naturally varying voice clips from two speakers into clusters to represent perceived identities. Across three independent participant samples, unfamiliar listeners perceived more identities than familiar listeners and frequently mistook exemplars from the same speaker to be different identities. These findings point towards a selective failure in ‘telling people together’. Our study highlights within‐person variability as a key feature of voices that has striking effects on (unfamiliar) voice identity perception. Our findings not only open up a new line of enquiry in the field of voice perception but also call for a re‐evaluation of theoretical models to account for natural variability during identity perception.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Psychology, Royal Holloway, University of London, Egham, UK.,Division of Psychology, Department of Life Sciences, Brunel University, London, UK
| | - Luke F K Burston
- Department of Psychology, Royal Holloway, University of London, Egham, UK
| | - Lúcia Garrido
- Division of Psychology, Department of Life Sciences, Brunel University, London, UK
| |
Collapse
|
21
|
Conde T, Gonçalves ÓF, Pinheiro AP. Stimulus complexity matters when you hear your own voice: Attention effects on self-generated voice processing. Int J Psychophysiol 2018; 133:66-78. [PMID: 30114437 DOI: 10.1016/j.ijpsycho.2018.08.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 06/05/2018] [Accepted: 08/10/2018] [Indexed: 11/26/2022]
Abstract
The ability to discriminate self- and non-self voice cues is a fundamental aspect of self-awareness and subserves self-monitoring during verbal communication. Nonetheless, the neurofunctional underpinnings of self-voice perception and recognition are still poorly understood. Moreover, how attention and stimulus complexity influence the processing and recognition of one's own voice remains to be clarified. Using an oddball task, the current study investigated how self-relevance and stimulus type interact during selective attention to voices, and how they affect the representation of regularity during voice perception. Event-related potentials (ERPs) were recorded from 18 right-handed males. Pre-recorded self-generated (SGV) and non-self (NSV) voices, consisting of a nonverbal vocalization (vocalization condition) or disyllabic word (word condition), were presented as either standard or target stimuli in different experimental blocks. The results showed increased N2 amplitude to SGV relative to NSV stimuli. Stimulus type modulated later processing stages only: P3 amplitude was increased for SGV relative to NSV words, whereas no differences between SGV and NSV were observed in the case of vocalizations. Moreover, SGV standards elicited reduced N1 and P2 amplitude relative to NSV standards. These findings revealed that the self-voice grabs more attention when listeners are exposed to words but not vocalizations. Further, they indicate that detection of regularity in an auditory stream is facilitated for one's own voice at early processing stages. Together, they demonstrate that self-relevance affects attention to voices differently as a function of stimulus type.
Collapse
Affiliation(s)
- Tatiana Conde
- Faculdade de Psicologia, Universidade de Lisboa, Lisbon, Portugal; Neuropsychophysiology Lab, CIPsi, School of Psychology, University of Minho, Braga, Portugal
| | - Óscar F Gonçalves
- Neuropsychophysiology Lab, CIPsi, School of Psychology, University of Minho, Braga, Portugal; Spaulding Center of Neuromodulation, Department of Physical Medicine & Rehabilitation, Spaulding Rehabilitation Hospital & Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA; Bouvé College of Health Sciences, Northeastern University, Boston, MA, USA
| | - Ana P Pinheiro
- Faculdade de Psicologia, Universidade de Lisboa, Lisbon, Portugal; Neuropsychophysiology Lab, CIPsi, School of Psychology, University of Minho, Braga, Portugal; Cognitive Neuroscience Lab, Department of Psychiatry, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
22
|
Maguinness C, Roswandowitz C, von Kriegstein K. Understanding the mechanisms of familiar voice-identity recognition in the human brain. Neuropsychologia 2018; 116:179-193. [DOI: 10.1016/j.neuropsychologia.2018.03.039] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Revised: 03/28/2018] [Accepted: 03/29/2018] [Indexed: 11/26/2022]
|
23
|
Papagno C, Mattavelli G, Casarotti A, Bello L, Gainotti G. Defective recognition and naming of famous people from voice in patients with unilateral temporal lobe tumours. Neuropsychologia 2018; 116:194-204. [DOI: 10.1016/j.neuropsychologia.2017.07.021] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Revised: 07/13/2017] [Accepted: 07/17/2017] [Indexed: 10/19/2022]
|
24
|
Lavan N, Short B, Wilding A, McGettigan C. Impoverished encoding of speaker identity in spontaneous laughter. EVOL HUM BEHAV 2018. [DOI: 10.1016/j.evolhumbehav.2017.11.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
25
|
Drozdova P, van Hout R, Scharenborg O. L2 voice recognition: The role of speaker-, listener-, and stimulus-related factors. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:3058. [PMID: 29195438 DOI: 10.1121/1.5010169] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Previous studies examined various factors influencing voice recognition and learning with mixed results. The present study investigates the separate and combined contribution of these various speaker-, stimulus-, and listener-related factors to voice recognition. Dutch listeners, with arguably incomplete phonological and lexical knowledge in the target language, English, learned to recognize the voice of four native English speakers, speaking in English, during four-day training. Training was successful and listeners' accuracy was shown to be influenced by the acoustic characteristics of speakers and the sound composition of the words used in the training, but not by lexical frequency of the words, nor the lexical knowledge of the listeners or their phonological aptitude. Although not conclusive, listeners with a lower working memory capacity seemed to be slower in learning voices than listeners with a higher working memory capacity. The results reveal that speaker-related, listener-related, and stimulus-related factors accumulate in voice recognition, while lexical information turns out not to play a role in successful voice learning and recognition. This implies that voice recognition operates at the prelexical processing level.
Collapse
Affiliation(s)
- Polina Drozdova
- Centre for Language Studies, Radboud University Nijmegen, Erasmusplein 1, P.O. Box 9103, 6500 HD Nijmegen, the Netherlands
| | - Roeland van Hout
- Centre for Language Studies, Radboud University Nijmegen, Erasmusplein 1, P.O. Box 9103, 6500 HD Nijmegen, the Netherlands
| | - Odette Scharenborg
- Centre for Language Studies, Radboud University Nijmegen, Erasmusplein 1, P.O. Box 9103, 6500 HD Nijmegen, the Netherlands
| |
Collapse
|
26
|
Fontaine M, Love SA, Latinus M. Familiarity and Voice Representation: From Acoustic-Based Representation to Voice Averages. Front Psychol 2017; 8:1180. [PMID: 28769836 PMCID: PMC5509798 DOI: 10.3389/fpsyg.2017.01180] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 06/28/2017] [Indexed: 11/13/2022] Open
Abstract
The ability to recognize an individual from their voice is a widespread ability with a long evolutionary history. Yet, the perceptual representation of familiar voices is ill-defined. In two experiments, we explored the neuropsychological processes involved in the perception of voice identity. We specifically explored the hypothesis that familiar voices (trained-to-familiar (Experiment 1), and famous voices (Experiment 2)) are represented as a whole complex pattern, well approximated by the average of multiple utterances produced by a single speaker. In experiment 1, participants learned three voices over several sessions, and performed a three-alternative forced-choice identification task on original voice samples and several “speaker averages,” created by morphing across varying numbers of different vowels (e.g., [a] and [i]) produced by the same speaker. In experiment 2, the same participants performed the same task on voice samples produced by familiar speakers. The two experiments showed that for famous voices, but not for trained-to-familiar voices, identification performance increased and response times decreased as a function of the number of utterances in the averages. This study sheds light on the perceptual representation of familiar voices, and demonstrates the power of average in recognizing familiar voices. The speaker average captures the unique characteristics of a speaker, and thus retains the information essential for recognition; it acts as a prototype of the speaker.
Collapse
Affiliation(s)
- Maureen Fontaine
- UMR7289, Centre National de la Recherche Scientifique, Institut de Neuroscience de la Timone, Aix-Marseille UniversitéMarseille, France
| | - Scott A Love
- UMR7289, Centre National de la Recherche Scientifique, Institut de Neuroscience de la Timone, Aix-Marseille UniversitéMarseille, France
| | - Marianne Latinus
- UMR7289, Centre National de la Recherche Scientifique, Institut de Neuroscience de la Timone, Aix-Marseille UniversitéMarseille, France
| |
Collapse
|
27
|
Botha A, Ras E, Abdoola S, Van der Linde J. Dysphonia in adults with developmental stuttering: A descriptive study. SOUTH AFRICAN JOURNAL OF COMMUNICATION DISORDERS 2017; 64:e1-e7. [PMID: 28697606 PMCID: PMC5843050 DOI: 10.4102/sajcd.v64i1.347] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 02/01/2017] [Accepted: 03/20/2017] [Indexed: 11/01/2022] Open
Abstract
BACKGROUND Persons with stuttering (PWS) often present with other co-occurring conditions. The World Health Organization's (WHO) International Classification of Functioning, Disability and Health (ICF) proposes that it is important to understand the full burden of a health condition. A few studies have explored voice problems among PWS, and the characteristics of voices of PWS are relatively unknown. The importance of conducting future research has been emphasised. OBJECTIVES This study aimed to describe the vocal characteristics of PWS. METHOD Acoustic and perceptual data were collected during a comprehensive voice assessment. The severity of stuttering was also determined. Correlations between the stuttering severity instrument (SSI) and the acoustic measurements were evaluated to determine the significance. Twenty participants were tested for this study. RESULT Only two participants (10%) obtained a positive Dysphonia Severity Index (DSI) score of 1.6 or higher, indicating that no dysphonia was present, while 90% of participants (n = 18) scored lower than 1.6, indicating that those participants presented with dysphonia. Some participants presented with weakness (asthenia) of voice (35%), while 65% presented with a slightly strained voice quality. Moderately positive correlations between breathiness and SSI (r = 0.40, p = 0.08) have been reported. In addition, participants with high SSI scores also scored a poor DSI of below 1.6, as observed by a moderate positive correlation between SSI and DSI (r = 0.41). CONCLUSION The majority of PWS presented with dysphonia, evident in the perceptual or acoustic parameters of their voices. These results can be used for further investigation to create awareness and to establish intervention strategies for voice disorders among PWS.
Collapse
Affiliation(s)
| | | | - Shabnam Abdoola
- Department of Speech-Language Pathology and Audiology, University of Pretoria.
| | | |
Collapse
|
28
|
Autistic Traits are Linked to Individual Differences in Familiar Voice Identification. J Autism Dev Disord 2017; 49:2747-2767. [PMID: 28247018 DOI: 10.1007/s10803-017-3039-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Autistic traits vary across the general population, and are linked with face recognition ability. Here we investigated potential links between autistic traits and voice recognition ability for personally familiar voices in a group of 30 listeners (15 female, 16-19 years) from the same local school. Autistic traits (particularly those related to communication and social interaction) were negatively correlated with voice recognition, such that more autistic traits were associated with fewer familiar voices identified and less ability to discriminate familiar from unfamiliar voices. In addition, our results suggest enhanced accessibility of personal semantic information in women compared to men. Overall, this study establishes a detailed pattern of relationships between voice identification performance and autistic traits in the general population.
Collapse
|
29
|
Abstract
Purpose (1) To explore the role of native voice and effects of voice loss on self-concept and identity, and survey the state of assistive voice technology; (2) to establish the moral case for developing personalized voice technology. Methods This narrative review examines published literature on the human significance of voice, the impact of voice loss on self-concept and identity, and the strengths and limitations of current voice technology. Based on the impact of voice loss on self and identity, and voice technology limitations, the moral case for personalized voice technology is developed. Results Given the richness of information conveyed by voice, loss of voice constrains expression of the self, but the full impact is poorly understood. Augmentative and alternative communication (AAC) devices facilitate communication but, despite advances in this field, voice output cannot yet express the unique nuances of individual voice. The ethical principles of autonomy, beneficence and equality of opportunity establish the moral responsibility to invest in accessible, cost-effective, personalized voice technology. Conclusions Although further research is needed to elucidate the full effects of voice loss on self-concept, identity and social functioning, current understanding of the profoundly negative impact of voice loss establishes the moral case for developing personalized voice technology. Implications for Rehabilitation Rehabilitation of voice-disordered patients should facilitate self-expression, interpersonal connectedness and social/occupational participation. Proactive questioning about the psychological and social experiences of patients with voice loss is a valuable entry point for rehabilitation planning. Personalized voice technology would enhance sense of self, communicative participation and autonomy and promote shared healthcare decision-making. Further research is needed to identify the best strategies to preserve and strengthen identity and sense of self.
Collapse
Affiliation(s)
- Esther Nathanson
- a The Neiswanger Institute for Bioethics, Loyola University Chicago Stritch School of Medicine , Maywood , IL , USA
| |
Collapse
|
30
|
Guo X, Luo B, Liu Y, Jiang TL, Feng J. Cannot see you but can hear you: vocal identity recognition in microbats. DONG WU XUE YAN JIU = ZOOLOGICAL RESEARCH 2015; 36:257-62. [PMID: 26452691 DOI: 10.13918/j.issn.2095-8137.2015.5.257] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Identity recognition is one of the most critical social behaviours in a variety of animal species. Microchiropteran bats present a special use case of acoustic communication in the dark. These bats use echolocation pulses for navigating, foraging, and communicating; however, increasing evidence suggests that echolocation pulses also serve as a means of social communication. Compared with echolocation signals, communication calls in bats have rather complex structures and differ substantially by social context. Bat acoustic signals vary broadly in spectrotemporal space among individuals, sexes, colonies and species. This type of information can be gathered from families of vocalizations based on voice characteristics. In this review we summarize the current studies regarding vocal identity recognition in microbats. We also provide recommendations and directions for further work.
Collapse
Affiliation(s)
- Xiong Guo
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, Northeast Normal University, Changchun 130117, China
| | - Bo Luo
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, Northeast Normal University, Changchun 130117, China
| | - Ying Liu
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, Northeast Normal University, Changchun 130117, China
| | - Ting-Lei Jiang
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, Northeast Normal University, Changchun 130117, China
| | - Jiang Feng
- Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization, Northeast Normal University, Changchun 130117, China.
| |
Collapse
|
31
|
McGettigan C. The social life of voices: studying the neural bases for the expression and perception of the self and others during spoken communication. Front Hum Neurosci 2015; 9:129. [PMID: 25852517 PMCID: PMC4365687 DOI: 10.3389/fnhum.2015.00129] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2014] [Accepted: 02/25/2015] [Indexed: 11/24/2022] Open
Affiliation(s)
- Carolyn McGettigan
- Department of Psychology, Royal Holloway, University of London Egham, UK
| |
Collapse
|
32
|
Perrachione TK, Stepp CE, Hillman RE, Wong PCM. Talker identification across source mechanisms: experiments with laryngeal and electrolarynx speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:1651-1665. [PMID: 24801962 PMCID: PMC4655826 DOI: 10.1044/2014_jslhr-s-13-0161] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2013] [Accepted: 03/12/2014] [Indexed: 05/29/2023]
Abstract
PURPOSE The purpose of this study was to determine listeners' ability to learn talker identity from speech produced with an electrolarynx, explore source and filter differentiation in talker identification, and describe acoustic-phonetic changes associated with electrolarynx use. METHOD Healthy adult control listeners learned to identify talkers from speech recordings produced using talkers' normal laryngeal vocal source or an electrolarynx. Listeners' abilities to identify talkers from the trained vocal source (Experiment 1) and generalize this knowledge to the untrained source (Experiment 2) were assessed. Acoustic-phonetic measurements of spectral differences between source mechanisms were performed. Additional listeners attempted to match recordings from different source mechanisms to a single talker (Experiment 3). RESULTS Listeners successfully learned talker identity from electrolarynx speech but less accurately than from laryngeal speech. Listeners were unable to generalize talker identity to the untrained source mechanism. Electrolarynx use resulted in vowels with higher F1 frequencies compared with laryngeal speech. Listeners matched recordings from different sources to a single talker better than chance. CONCLUSIONS Electrolarynx speech, although lacking individual differences in voice quality, nevertheless conveys sufficient indexical information related to the vocal filter and articulation for listeners to identify individual talkers. Psychologically, perception of talker identity arises from a "gestalt" of the vocal source and filter.
Collapse
|
33
|
Roswandowitz C, Mathias S, Hintz F, Kreitewolf J, Schelinski S, von Kriegstein K. Two Cases of Selective Developmental Voice-Recognition Impairments. Curr Biol 2014; 24:2348-53. [DOI: 10.1016/j.cub.2014.08.048] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2014] [Revised: 07/28/2014] [Accepted: 08/20/2014] [Indexed: 11/28/2022]
|
34
|
Know thy sound: perceiving self and others in musical contexts. Acta Psychol (Amst) 2014; 152:67-74. [PMID: 25113128 DOI: 10.1016/j.actpsy.2014.07.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 04/07/2014] [Accepted: 07/07/2014] [Indexed: 12/14/2022] Open
Abstract
This review article provides a summary of the findings from empirical studies that investigated recognition of an action's agent by using music and/or other auditory information. Embodied cognition accounts ground higher cognitive functions in lower level sensorimotor functioning. Action simulation, the recruitment of an observer's motor system and its neural substrates when observing actions, has been proposed to be particularly potent for actions that are self-produced. This review examines evidence for such claims from the music domain. It covers studies in which trained or untrained individuals generated and/or perceived (musical) sounds, and were subsequently asked to identify who was the author of the sounds (e.g., the self or another individual) in immediate (online) or delayed (offline) research designs. The review is structured according to the complexity of auditory-motor information available and includes sections on: 1) simple auditory information (e.g., clapping, piano, drum sounds), 2) complex instrumental sound sequences (e.g., piano/organ performances), and 3) musical information embedded within audiovisual performance contexts, when action sequences are both viewed as movements and/or listened to in synchrony with sounds (e.g., conductors' gestures, dance). This work has proven to be informative in unraveling the links between perceptual-motor processes, supporting embodied accounts of human cognition that address action observation. The reported findings are examined in relation to cues that contribute to agency judgments, and their implications for research concerning action understanding and applied musical practice.
Collapse
|
35
|
Blank H, Kiebel SJ, von Kriegstein K. How the human brain exchanges information across sensory modalities to recognize other people. Hum Brain Mapp 2014; 36:324-39. [PMID: 25220190 DOI: 10.1002/hbm.22631] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2014] [Revised: 08/29/2014] [Accepted: 08/29/2014] [Indexed: 11/09/2022] Open
Abstract
Recognizing the identity of other individuals across different sensory modalities is critical for successful social interaction. In the human brain, face- and voice-sensitive areas are separate, but structurally connected. What kind of information is exchanged between these specialized areas during cross-modal recognition of other individuals is currently unclear. For faces, specific areas are sensitive to identity and to physical properties. It is an open question whether voices activate representations of face identity or physical facial properties in these areas. To address this question, we used functional magnetic resonance imaging in humans and a voice-face priming design. In this design, familiar voices were followed by morphed faces that matched or mismatched with respect to identity or physical properties. The results showed that responses in face-sensitive regions were modulated when face identity or physical properties did not match to the preceding voice. The strength of this mismatch signal depended on the level of certainty the participant had about the voice identity. This suggests that both identity and physical property information was provided by the voice to face areas. The activity and connectivity profiles differed between face-sensitive areas: (i) the occipital face area seemed to receive information about both physical properties and identity, (ii) the fusiform face area seemed to receive identity, and (iii) the anterior temporal lobe seemed to receive predominantly identity information from the voice. We interpret these results within a prediction coding scheme in which both identity and physical property information is used across sensory modalities to recognize individuals.
Collapse
Affiliation(s)
- Helen Blank
- Max Planck Institute for Human Cognitive and Brain Sciences, 04103, Leipzig, Germany; MRC Cognition and Brain Sciences Unit, Cambridge CB2 7EF, United Kingdom
| | | | | |
Collapse
|
36
|
Hearing Faces and Seeing Voices: The Integration and Interaction of Face and Voice Processing. Psychol Belg 2014. [DOI: 10.5334/pb.ar] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
37
|
Interdependence of linguistic and indexical speech perception skills in school-age children with early cochlear implantation. Ear Hear 2014; 34:562-74. [PMID: 23652814 DOI: 10.1097/aud.0b013e31828d2bd6] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES This study documented the ability of experienced pediatric cochlear implant (CI) users to perceive linguistic properties (what is said) and indexical attributes (emotional intent and talker identity) of speech, and examined the extent to which linguistic (LSP) and indexical (ISP) perception skills are related. Preimplant-aided hearing, age at implantation, speech processor technology, CI-aided thresholds, sequential bilateral cochlear implantation, and academic integration with hearing age-mates were examined for their possible relationships to both LSP and ISP skills. DESIGN Sixty 9- to 12-year olds, first implanted at an early age (12 to 38 months), participated in a comprehensive test battery that included the following LSP skills: (1) recognition of monosyllabic words at loud and soft levels, (2) repetition of phonemes and suprasegmental features from nonwords, and (3) recognition of key words from sentences presented within a noise background, and the following ISP skills: (1) discrimination of across-gender and within-gender (female) talkers and (2) identification and discrimination of emotional content from spoken sentences. A group of 30 age-matched children without hearing loss completed the nonword repetition, and talker- and emotion-perception tasks for comparison. RESULTS Word-recognition scores decreased with signal level from a mean of 77% correct at 70 dB SPL to 52% at 50 dB SPL. On average, CI users recognized 50% of key words presented in sentences that were 9.8 dB above background noise. Phonetic properties were repeated from nonword stimuli at about the same level of accuracy as suprasegmental attributes (70 and 75%, respectively). The majority of CI users identified emotional content and differentiated talkers significantly above chance levels. Scores on LSP and ISP measures were combined into separate principal component scores and these components were highly correlated (r = 0.76). Both LSP and ISP component scores were higher for children who received a CI at the youngest ages, upgraded to more recent CI technology and had lower CI-aided thresholds. Higher scores, for both LSP and ISP components, were also associated with higher language levels and mainstreaming at younger ages. Higher ISP scores were associated with better social skills. CONCLUSIONS Results strongly support a link between indexical and linguistic properties in perceptual analysis of speech. These two channels of information appear to be processed together in parallel by the auditory system and are inseparable in perception. Better speech performance, for both linguistic and indexical perception, is associated with younger age at implantation and use of more recent speech processor technology. Children with better speech perception demonstrated better spoken language, earlier academic mainstreaming, and placement in more typically sized classrooms (i.e., >20 students). Well-developed social skills were more highly associated with the ability to discriminate the nuances of talker identity and emotion than with the ability to recognize words and sentences through listening. The extent to which early cochlear implantation enabled these early-implanted children to make use of both linguistic and indexical properties of speech influenced not only their development of spoken language, but also their ability to function successfully in a hearing world.
Collapse
|
38
|
Gillespie A. Nuclear brinkmanship: a study in non-linguistic communication. Integr Psychol Behav Sci 2013; 47:492-508. [PMID: 23999920 DOI: 10.1007/s12124-013-9245-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This article examines meaning making with nuclear bombs and military manoeuvres. The data is verbatim audio recordings from the White House during the Cuban Missile Crisis. The analysis uses concepts from impression management and dialogism. It is found that actions often speak louder than words and that even non-linguistic communication with nuclear weapons is often oriented to third-parties, in this case, world opinion. A novel process of 'staging the other' is identified, that is, when one side tries to create a situation which will force the other side to act in a way which will create a negative impression on world opinion. Staging the other is a subtle form of meaning making for it entails shaping how third parties will view a situation without those third parties being aware of the intentionality of the communication.
Collapse
Affiliation(s)
- Alex Gillespie
- Institute of Social Psychology, London School of Economics, Houghton Street, London, WC2A 2AE, UK,
| |
Collapse
|
39
|
Yovel G, Belin P. A unified coding strategy for processing faces and voices. Trends Cogn Sci 2013; 17:263-71. [PMID: 23664703 PMCID: PMC3791405 DOI: 10.1016/j.tics.2013.04.004] [Citation(s) in RCA: 100] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Revised: 04/05/2013] [Accepted: 04/07/2013] [Indexed: 11/23/2022]
Abstract
Both faces and voices are rich in socially-relevant information, which humans are remarkably adept at extracting, including a person's identity, age, gender, affective state, personality, etc. Here, we review accumulating evidence from behavioral, neuropsychological, electrophysiological, and neuroimaging studies which suggest that the cognitive and neural processing mechanisms engaged by perceiving faces or voices are highly similar, despite the very different nature of their sensory input. The similarity between the two mechanisms likely facilitates the multi-modal integration of facial and vocal information during everyday social interactions. These findings emphasize a parsimonious principle of cerebral organization, where similar computational problems in different modalities are solved using similar solutions.
Collapse
Affiliation(s)
- Galit Yovel
- School of Psychological Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Pascal Belin
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
- Département de Psychologie, Université de Montréal, Montréal, Canada
- Institut des Neurosciences de La Timone, UMR 7289, CNRS and Université Aix-Marseille, France
| |
Collapse
|
40
|
Kastein HB, Winter R, Vinoth Kumar AK, Kandula S, Schmidt S. Perception of individuality in bat vocal communication: discrimination between, or recognition of, interaction partners? Anim Cogn 2013; 16:945-59. [DOI: 10.1007/s10071-013-0628-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 03/11/2013] [Accepted: 03/29/2013] [Indexed: 11/25/2022]
|
41
|
Bertau MC. Voice as Heuristic device to integrate biological and social sciences: a comment to Sidtis & Kreiman's in the Beginning was the Familiar Voice. Integr Psychol Behav Sci 2012; 46:160-71. [PMID: 22189797 DOI: 10.1007/s12124-011-9190-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Sidtis and Kreiman (2011) offer a two-sided approach to voice where the biological side is thought to support the psycho-social one. Linking psychological and biological sciences by the notion of "familiar voice" they introduce, Sidtis and Kreiman (2011) foster integration in science and offer a broad view on the voice phenomenon. The way this integration is conducted is closely observed in this comment. The conclusion is that a common point of departure which does not belong to the mainstream in present academic discourse can be ascribed to both sides invoked: a dialogic view of human beings. The social dimension of the neuropsychological social model of voice recognition the authors propose is then discussed. This is taken up in the closing considerations addressing the core notion of familiarity with regard to the conception of sociality it implies; this perspective raises also the issue of the relationship between (familiar) voice and language. In analogy to the dialogic view of human beings we advocate for in accordance with Sidtis and Kreiman (2011), a notion of language emphasizing the sensorily experienced performance of symbolic activity is put forth. In this, voice holds a core place .
Collapse
Affiliation(s)
- Marie-Cécile Bertau
- Marie-Cécile Bertau, Institut für Phonetik und Sprachverarbeitung, Universität München, Munich, Germany.
| |
Collapse
|