1
|
Qi Z, Zeng W, Zang D, Wang Z, Luo L, Wu X, Yu J, Mao Y. Classifying disorders of consciousness using a novel dual-level and dual-modal graph learning model. J Transl Med 2024; 22:950. [PMID: 39434088 PMCID: PMC11492684 DOI: 10.1186/s12967-024-05729-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 10/01/2024] [Indexed: 10/23/2024] Open
Abstract
BACKGROUND Disorders of consciousness (DoC) are a group of conditions that affect the level of awareness and communication in patients. While neuroimaging techniques can provide useful information about the brain structure and function in these patients, most existing methods rely on a single modality for analysis and rarely account for brain injury. To address these limitations, we propose a novel method that integrates two neuroimaging modalities, functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI), to enhance the classification of subjects into different states of consciousness. METHOD AND RESULTS The main contributions of our work are threefold: first, after constructing a dual-model individual graph using functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI), we introduce a brain injury mask mechanism that consolidates damaged brain regions into a single graph node, enhancing the modeling of brain injuries and reducing deformation effects. Second, to address over-smoothing, we construct a dual-level graph that dynamically construct a population-level graph with node features from individual graphs, to promote the clustering of similar subjects while distinguishing dissimilar ones. Finally, we employ a subgraph exploration model with task-fMRI data to validate the interpretability of our model, confirming that the selected brain regions are task-relevant in cognition. Our experimental results on data from 89 healthy participants and 204 patients with DoC from Huashan Hospital, Fudan University, demonstrate that our method achieves high accuracy in classifying patients into unresponsive wakefulness syndrome (UWS), minimally conscious state (MCS), or normal conscious state, outperforming current state-of-the-art methods. The explainability results of our method identified a subset of brain regions that are important for consciousness, such as the default mode network, the salience network, the dorsal attention network, and the visual network. Our method also revealed the relationship between brain networks and language processing in consciousness, and showed that language-related subgraphs can distinguish MCS from UWS patients. CONCLUSION We proposed a novel graph learning method for classifying DoC based on fMRI and DTI data, introducing a brain injury mask mechanism to effectively handle damaged brains. The classification results demonstrate the effectiveness of our method in distinguishing subjects across different states of consciousness, while the explainability results identify key brain regions relevant to this classification. Our study provides new evidence for the role of brain networks and language processing in consciousness, with potential implications for improving the diagnosis and prognosis of patients with DoC.
Collapse
Affiliation(s)
- Zengxin Qi
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, 200030, China
- National Center for Neurological Disorders, Shanghai, 200030, China
- Shanghai Key laboratory of Brain Function Restoration and Neural Regeneration, Shanghai, 200030, China
- State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences, Institutes of Brain Science, Fudan University, Shanghai, 200030, China
| | - Wenwen Zeng
- School of Information Science and Technology, Fudan University, Shanghai, China
| | - Di Zang
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, 200030, China.
- National Center for Neurological Disorders, Shanghai, 200030, China.
- Shanghai Key laboratory of Brain Function Restoration and Neural Regeneration, Shanghai, 200030, China.
- State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences, Institutes of Brain Science, Fudan University, Shanghai, 200030, China.
- Department of Neurosurgery, China-Japan Friendship Hospital, Beijing, China.
| | - Zhe Wang
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, 200030, China
- National Center for Neurological Disorders, Shanghai, 200030, China
- Shanghai Key laboratory of Brain Function Restoration and Neural Regeneration, Shanghai, 200030, China
- State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences, Institutes of Brain Science, Fudan University, Shanghai, 200030, China
| | - Lanqin Luo
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, 200030, China
- National Center for Neurological Disorders, Shanghai, 200030, China
- Shanghai Key laboratory of Brain Function Restoration and Neural Regeneration, Shanghai, 200030, China
- State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences, Institutes of Brain Science, Fudan University, Shanghai, 200030, China
| | - Xuehai Wu
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, 200030, China.
- National Center for Neurological Disorders, Shanghai, 200030, China.
- Shanghai Key laboratory of Brain Function Restoration and Neural Regeneration, Shanghai, 200030, China.
- State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences, Institutes of Brain Science, Fudan University, Shanghai, 200030, China.
| | - Jinhua Yu
- School of Information Science and Technology, Fudan University, Shanghai, China.
| | - Ying Mao
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, 200030, China.
- National Center for Neurological Disorders, Shanghai, 200030, China.
- Shanghai Key laboratory of Brain Function Restoration and Neural Regeneration, Shanghai, 200030, China.
- State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences, Institutes of Brain Science, Fudan University, Shanghai, 200030, China.
| |
Collapse
|
2
|
Voruz P, Orepic P, Coll SY, Haemmerli J, Blanke O, Péron JA, Schaller K, Iannotti GR. Self-other voice discrimination task: A potential neuropsychological tool for clinical assessment of self-related deficits. Heliyon 2024; 10:e38711. [PMID: 39430528 PMCID: PMC11490823 DOI: 10.1016/j.heliyon.2024.e38711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 07/12/2024] [Accepted: 09/27/2024] [Indexed: 10/22/2024] Open
Abstract
Background Deficits in self are commonly described through different neuro-pathologies, based on clinical evaluations and experimental paradigms. However, currently available approaches lack appropriate clinical validation, making objective evaluation and discrimination of self-related deficits challenging. Methods We applied a statistical standardized method to assess the clinical discriminatory capacity of a Self-Other Voice Discrimination (SOVD) task. This task, validated experimentally as a marker for self-related deficits, was administered to 17 patients eligible for neurosurgery due to focal hemispheric brain tumors or epileptic lesions. Results The clinical discriminatory capacity of the SOVD task was evident in three patients who exhibited impairments for self-voice perception that could not be predicted by other neuropsychological deficits. Impairments in other-voice perception were linked to inhibitory neuropsychological deficits, suggesting a potential association with executive deficits in voice recognition. Conclusions This exploratory study highlights the clinical discriminatory potential of the SOVD task and suggests that it could complement the standard neuropsychological assessment, paving the way for enhanced diagnoses and tailored treatments for self-related deficits.
Collapse
Affiliation(s)
- Philippe Voruz
- Department of Neurosurgery, University Hospitals of Geneva, 1205, Geneva, Switzerland
- Clinical and Experimental Neuropsychology Laboratory, Faculty of Psychology, University of Geneva, 1205, Geneva, Switzerland
| | - Pavo Orepic
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1202, Geneva, Switzerland
| | - Selim Yahia Coll
- Department of Neurosurgery, University Hospitals of Geneva, 1205, Geneva, Switzerland
- Laboratory of Cognitive Neurorehabilitation, Faculty of Medicine, University of Geneva, 1205, Geneva, Switzerland
| | - Julien Haemmerli
- Department of Neurosurgery, University Hospitals of Geneva, 1205, Geneva, Switzerland
| | - Olaf Blanke
- Laboratory of Cognitive Neuroscience, Neuro-X Institute and Brain Mind Institute, Faculty of Life Sciences, Swiss Federal Institute of Technology (EPFL), 1211, Geneva, Switzerland
| | - Julie Anne Péron
- Clinical and Experimental Neuropsychology Laboratory, Faculty of Psychology, University of Geneva, 1205, Geneva, Switzerland
| | - Karl Schaller
- Department of Neurosurgery, University Hospitals of Geneva, 1205, Geneva, Switzerland
- NeuroCentre, University Hospitals of Geneva, 1205, Geneva, Switzerland
| | - Giannina Rita Iannotti
- Department of Neurosurgery, University Hospitals of Geneva, 1205, Geneva, Switzerland
- NeuroCentre, University Hospitals of Geneva, 1205, Geneva, Switzerland
| |
Collapse
|
3
|
Kurumada C, Rivera R, Allen P, Bennetto L. Perception and adaptation of receptive prosody in autistic adolescents. Sci Rep 2024; 14:16409. [PMID: 39013983 PMCID: PMC11252140 DOI: 10.1038/s41598-024-66569-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 07/01/2024] [Indexed: 07/18/2024] Open
Abstract
A fundamental aspect of language processing is inferring others' minds from subtle variations in speech. The same word or sentence can often convey different meanings depending on its tempo, timing, and intonation-features often referred to as prosody. Although autistic children and adults are known to experience difficulty in making such inferences, the science remains unclear as to why. We hypothesize that detail-oriented perception in autism may interfere with the inference process if it lacks the adaptivity required to cope with the variability ubiquitous in human speech. Using a novel prosodic continuum that shifts the sentence meaning gradiently from a statement (e.g., "It's raining") to a question (e.g., "It's raining?"), we have investigated the perception and adaptation of receptive prosody in autistic adolescents and two groups of non-autistic controls. Autistic adolescents showed attenuated adaptivity in categorizing prosody, whereas they were equivalent to controls in terms of discrimination accuracy. Combined with recent findings in segmental (e.g., phoneme) recognition, the current results provide the basis for an emerging research framework for attenuated flexibility and reduced influence of contextual feedback as a possible source of deficits that hinder linguistic and social communication in autism.
Collapse
Affiliation(s)
- Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, 14627, USA.
| | - Rachel Rivera
- Psychology, University of Rochester, Rochester, 14627, USA
| | - Paul Allen
- Psychology, University of Rochester, Rochester, 14627, USA
- Otolaryngology, University of Rochester Medical Center, Rochester, 14642, USA
| | - Loisa Bennetto
- Psychology, University of Rochester, Rochester, 14627, USA
| |
Collapse
|
4
|
Harford EE, Holt LL, Abel TJ. Unveiling the development of human voice perception: Neurobiological mechanisms and pathophysiology. CURRENT RESEARCH IN NEUROBIOLOGY 2024; 6:100127. [PMID: 38511174 PMCID: PMC10950757 DOI: 10.1016/j.crneur.2024.100127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 02/22/2024] [Accepted: 02/26/2024] [Indexed: 03/22/2024] Open
Abstract
The human voice is a critical stimulus for the auditory system that promotes social connection, informs the listener about identity and emotion, and acts as the carrier for spoken language. Research on voice processing in adults has informed our understanding of the unique status of the human voice in the mature auditory cortex and provided potential explanations for mechanisms that underly voice selectivity and identity processing. There is evidence that voice perception undergoes developmental change starting in infancy and extending through early adolescence. While even young infants recognize the voice of their mother, there is an apparent protracted course of development to reach adult-like selectivity for human voice over other sound categories and recognition of other talkers by voice. Gaps in the literature do not allow for an exact mapping of this trajectory or an adequate description of how voice processing and its neural underpinnings abilities evolve. This review provides a comprehensive account of developmental voice processing research published to date and discusses how this evidence fits with and contributes to current theoretical models proposed in the adult literature. We discuss how factors such as cognitive development, neural plasticity, perceptual narrowing, and language acquisition may contribute to the development of voice processing and its investigation in children. We also review evidence of voice processing abilities in premature birth, autism spectrum disorder, and phonagnosia to examine where and how deviations from the typical trajectory of development may manifest.
Collapse
Affiliation(s)
- Emily E. Harford
- Department of Neurological Surgery, University of Pittsburgh, USA
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, USA
| | - Taylor J. Abel
- Department of Neurological Surgery, University of Pittsburgh, USA
- Department of Bioengineering, University of Pittsburgh, USA
| |
Collapse
|
5
|
Stevenage SV, Edey R, Keay R, Morrison R, Robertson DJ. Familiarity Is Key: Exploring the Effect of Familiarity on the Face-Voice Correlation. Brain Sci 2024; 14:112. [PMID: 38391687 PMCID: PMC10887171 DOI: 10.3390/brainsci14020112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 01/15/2024] [Accepted: 01/19/2024] [Indexed: 02/24/2024] Open
Abstract
Recent research has examined the extent to which face and voice processing are associated by virtue of the fact that both tap into a common person perception system. However, existing findings do not yet fully clarify the role of familiarity in this association. Given this, two experiments are presented that examine face-voice correlations for unfamiliar stimuli (Experiment 1) and for familiar stimuli (Experiment 2). With care being taken to use tasks that avoid floor and ceiling effects and that use realistic speech-based voice clips, the results suggested a significant positive but small-sized correlation between face and voice processing when recognizing unfamiliar individuals. In contrast, the correlation when matching familiar individuals was significant and positive, but much larger. The results supported the existing literature suggesting that face and voice processing are aligned as constituents of an overarching person perception system. However, the difference in magnitude of their association here reinforced the view that familiar and unfamiliar stimuli are processed in different ways. This likely reflects the importance of a pre-existing mental representation and cross-talk within the neural architectures when processing familiar faces and voices, and yet the reliance on more superficial stimulus-based and modality-specific analysis when processing unfamiliar faces and voices.
Collapse
Affiliation(s)
- Sarah V Stevenage
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - Rebecca Edey
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - Rebecca Keay
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - Rebecca Morrison
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | - David J Robertson
- Department of Psychological Sciences and Health, University of Strathclyde, Glasgow G1 1QE, UK
| |
Collapse
|
6
|
Har-Shai Yahav P, Sharaabi A, Zion Golumbic E. The effect of voice familiarity on attention to speech in a cocktail party scenario. Cereb Cortex 2024; 34:bhad475. [PMID: 38142293 DOI: 10.1093/cercor/bhad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 12/25/2023] Open
Abstract
Selective attention to one speaker in multi-talker environments can be affected by the acoustic and semantic properties of speech. One highly ecological feature of speech that has the potential to assist in selective attention is voice familiarity. Here, we tested how voice familiarity interacts with selective attention by measuring the neural speech-tracking response to both target and non-target speech in a dichotic listening "Cocktail Party" paradigm. We measured Magnetoencephalography from n = 33 participants, presented with concurrent narratives in two different voices, and instructed to pay attention to one ear ("target") and ignore the other ("non-target"). Participants were familiarized with one of the voices during the week prior to the experiment, rendering this voice familiar to them. Using multivariate speech-tracking analysis we estimated the neural responses to both stimuli and replicate their well-established modulation by selective attention. Importantly, speech-tracking was also affected by voice familiarity, showing enhanced response for target speech and reduced response for non-target speech in the contra-lateral hemisphere, when these were in a familiar vs. an unfamiliar voice. These findings offer valuable insight into how voice familiarity, and by extension, auditory-semantics, interact with goal-driven attention, and facilitate perceptual organization and speech processing in noisy environments.
Collapse
Affiliation(s)
- Paz Har-Shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Aviya Sharaabi
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
7
|
Pautz N, McDougall K, Mueller-Johnson K, Nolan F, Paver A, Smith HMJ. Identifying unfamiliar voices: Examining the system variables of sample duration and parade size. Q J Exp Psychol (Hove) 2023; 76:2804-2822. [PMID: 36718784 PMCID: PMC10655699 DOI: 10.1177/17470218231155738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 01/13/2023] [Accepted: 01/17/2023] [Indexed: 02/01/2023]
Abstract
Voice identification parades can be unreliable due to the error-prone nature of earwitness responses. UK government guidelines recommend that voice parades should have nine voices, each played for 60 s. This makes parades resource-consuming to construct. In this article, we conducted two experiments to see if voice parade procedures could be simplified. In Experiment 1 (N = 271, 135 female), we investigated if reducing the duration of the voice samples on a nine-voice parade would negatively affect identification performance using both conventional logistic and signal detection approaches. In Experiment 2 (N = 270, 136 female), we first explored if the same sample duration conditions used in Experiment 1 would lead to different outcomes if we reduced the parade size to include only six voices. Following this, we pooled the data from both experiments to investigate the influence of target-position effects. The results show that 15-s sample durations result in statistically equivalent voice identification performance to the longer 60-s sample durations, but that the 30-s sample duration suffers in terms of overall signal sensitivity. This pattern of results was replicated using both a nine- and a six-voice parade. Performance on target-absent parades were at chance levels in both parade sizes, and response criteria were mostly liberal. In addition, unwanted position effects were present. The results provide initial evidence that the sample duration used in a voice parade may be reduced, but we argue that the guidelines recommending a parade with nine voices should be maintained to provide additional protection for a potentially innocent suspect given the low target-absent accuracy.
Collapse
|
8
|
Lavan N, McGettigan C. A model for person perception from familiar and unfamiliar voices. COMMUNICATIONS PSYCHOLOGY 2023; 1:1. [PMID: 38665246 PMCID: PMC11041786 DOI: 10.1038/s44271-023-00001-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 04/28/2023] [Indexed: 04/28/2024]
Abstract
When hearing a voice, listeners can form a detailed impression of the person behind the voice. Existing models of voice processing focus primarily on one aspect of person perception - identity recognition from familiar voices - but do not account for the perception of other person characteristics (e.g., sex, age, personality traits). Here, we present a broader perspective, proposing that listeners have a common perceptual goal of perceiving who they are hearing, whether the voice is familiar or unfamiliar. We outline and discuss a model - the Person Perception from Voices (PPV) model - that achieves this goal via a common mechanism of recognising a familiar person, persona, or set of speaker characteristics. Our PPV model aims to provide a more comprehensive account of how listeners perceive the person they are listening to, using an approach that incorporates and builds on aspects of the hierarchical frameworks and prototype-based mechanisms proposed within existing models of voice identity recognition.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Experimental and Biological Psychology, Queen Mary University of London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing, and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
9
|
Humble D, Schweinberger SR, Mayer A, Jesgarzewsky TL, Dobel C, Zäske R. The Jena Voice Learning and Memory Test (JVLMT): A standardized tool for assessing the ability to learn and recognize voices. Behav Res Methods 2023; 55:1352-1371. [PMID: 35648317 PMCID: PMC10126074 DOI: 10.3758/s13428-022-01818-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2022] [Indexed: 11/08/2022]
Abstract
The ability to recognize someone's voice spans a broad spectrum with phonagnosia on the low end and super-recognition at the high end. Yet there is no standardized test to measure an individual's ability of learning and recognizing newly learned voices with samples of speech-like phonetic variability. We have developed the Jena Voice Learning and Memory Test (JVLMT), a 22-min test based on item response theory and applicable across languages. The JVLMT consists of three phases in which participants (1) become familiarized with eight speakers, (2) revise the learned voices, and (3) perform a 3AFC recognition task, using pseudo-sentences devoid of semantic content. Acoustic (dis)similarity analyses were used to create items with various levels of difficulty. Test scores are based on 22 items which had been selected and validated based on two online studies with 232 and 454 participants, respectively. Mean accuracy in the JVLMT is 0.51 (SD = .18) with an empirical (marginal) reliability of 0.66. Correlational analyses showed high and moderate convergent validity with the Bangor Voice Matching Test (BVMT) and Glasgow Voice Memory Test (GVMT), respectively, and high discriminant validity with a digit span test. Four participants with potential super recognition abilities and seven participants with potential phonagnosia were identified who performed at least 2 SDs above or below the mean, respectively. The JVLMT is a promising research and diagnostic screening tool to detect both impairments in voice recognition and super-recognition abilities.
Collapse
Affiliation(s)
- Denise Humble
- Department of Experimental Otorhinolaryngology, Jena University Hospital, Stoystrasse 3, 07743, Jena, Germany
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University Jena, Am Steiger 3/1, 07743, Jena, Germany
| | - Stefan R Schweinberger
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University Jena, Am Steiger 3/1, 07743, Jena, Germany
| | - Axel Mayer
- Department of Psychological Methods and Evaluation, Institute of Psychology, Institute of Psychology and Sports Science, University of Bielefeld, Universitätsstr. 25, 33615, Bielefeld, Germany
| | - Tim L Jesgarzewsky
- Department of Experimental Otorhinolaryngology, Jena University Hospital, Stoystrasse 3, 07743, Jena, Germany
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University Jena, Am Steiger 3/1, 07743, Jena, Germany
| | - Christian Dobel
- Department of Experimental Otorhinolaryngology, Jena University Hospital, Stoystrasse 3, 07743, Jena, Germany
| | - Romi Zäske
- Department of Experimental Otorhinolaryngology, Jena University Hospital, Stoystrasse 3, 07743, Jena, Germany.
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University Jena, Am Steiger 3/1, 07743, Jena, Germany.
| |
Collapse
|
10
|
Which components of famous people recognition are lateralized? A study of face, voice and name recognition disorders in patients with neoplastic or degenerative damage of the right or left anterior temporal lobes. Neuropsychologia 2023; 181:108490. [PMID: 36693520 DOI: 10.1016/j.neuropsychologia.2023.108490] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 01/16/2023] [Accepted: 01/16/2023] [Indexed: 01/22/2023]
Abstract
We administered to large groups of patients with neoplastic or degenerative damage affecting the right or left ATL, the 'Famous People Recognition Battery' (FPRB), in which subjects are required to recognize the same 40 famous people through their faces, voices and names, to clarify which components of famous people recognition are lateralized. At the familiarity level, we found, as expected, a dissociation between a greater impairment of patients with right ATL lesions on the non-verbal (face and voice) recognition modalities and of those with left ATL lesions on name familiarity. Equally expected were results obtained at the naming level, because the worse naming scores for faces and voices were observed in left-sided patients. Less foregone were, for two reasons, results obtained at the semantic level. First, no difference was found between the two hemispheric groups when scores obtained on the verbal (name) and non-verbal (face and voice) recognition modalities were account for. Second, the face and voice recognition modalities showed a different degree of right lateralization. All groups of patients showed, indeed, both at the familiarity and at the semantic level, a greater difficulty in the recognition of voices regarding faces, but this difference reached significance only in patients with right ATL lesions, suggesting a greater right lateralization of the more complex task of voice recognition. A model aiming to explain the greater right lateralization of the more perceptually demanding voice modality of person recognition is proposed.
Collapse
|
11
|
Stevenage SV, Singh L, Dixey P. The Curious Case of Impersonators and Singers: Telling Voices Apart and Telling Voices Together under Naturally Challenging Listening Conditions. Brain Sci 2023; 13:358. [PMID: 36831901 PMCID: PMC9954053 DOI: 10.3390/brainsci13020358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/12/2023] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Vocal identity processing depends on the ability to tell apart two instances of different speakers whilst also being able to tell together two instances of the same speaker. Whilst previous research has examined these voice processing capabilities under relatively common listening conditions, it has not yet tested the limits of these capabilities. Here, two studies are presented that employ challenging listening tasks to determine just how good we are at these voice processing tasks. In Experiment 1, 54 university students were asked to distinguish between very similar sounding, yet different speakers (celebrity targets and their impersonators). Participants completed a 'Same/Different' task and a 'Which is the Celebrity?' task to pairs of speakers, and a 'Real or Not?' task to individual speakers. In Experiment 2, a separate group of 40 university students was asked to pair very different sounding instances of the same speakers (speaking and singing). Participants were presented with an array of voice clips and completed a 'Pairs Task' as a variant of the more traditional voice sorting task. The results of Experiment 1 suggested that significantly more mistakes were made when distinguishing celebrity targets from their impersonators than when distinguishing the same targets from control voices. Nevertheless, listeners were significantly better than chance in all three tasks despite the challenge. Similarly, the results of Experiment 2 suggested that it was significantly more difficult to pair singing and speaking clips than to pair two speaking clips, particularly when the speakers were unfamiliar. Again, however, the performance was significantly above zero, and was again better than chance in a cautious comparison. Taken together, the results suggest that vocal identity processing is a highly adaptable task, assisted by familiarity with the speaker. However, the fact that performance remained above chance in all tasks suggests that we had not reached the limit of our listeners' capability, despite the considerable listening challenges introduced. We conclude that voice processing is far better than previous research might have presumed.
Collapse
Affiliation(s)
- Sarah V. Stevenage
- School of Psychology, University of Southampton, Southampton SO17 1BJ, UK
| | | | | |
Collapse
|
12
|
Orepic P, Kannape OA, Faivre N, Blanke O. Bone conduction facilitates self-other voice discrimination. ROYAL SOCIETY OPEN SCIENCE 2023; 10:221561. [PMID: 36816848 PMCID: PMC9929504 DOI: 10.1098/rsos.221561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 01/27/2023] [Indexed: 06/18/2023]
Abstract
One's own voice is one of the most important and most frequently heard voices. Although it is the sound we associate most with ourselves, it is perceived as strange when played back in a recording. One of the main reasons is the lack of bone conduction that is inevitably present when hearing one's own voice while speaking. The resulting discrepancy between experimental and natural self-voice stimuli has significantly impeded self-voice research, rendering it one of the least investigated aspects of self-consciousness. Accordingly, factors that contribute to self-voice perception remain largely unknown. In a series of three studies, we rectified this ecological discrepancy by augmenting experimental self-voice stimuli with bone-conducted vibrotactile stimulation that is present during natural self-voice perception. Combining voice morphing with psychophysics, we demonstrate that specifically self-other but not familiar-other voice discrimination improved for stimuli presented using bone as compared with air conduction. Furthermore, our data outline independent contributions of familiarity and acoustic processing to separating the own from another's voice: although vocal differences increased general voice discrimination, self-voices were more confused with familiar than unfamiliar voices, regardless of their acoustic similarity. Collectively, our findings show that concomitant vibrotactile stimulation improves auditory self-identification, thereby portraying self-voice as a fundamentally multi-modal construct.
Collapse
Affiliation(s)
- Pavo Orepic
- Laboratory of Cognitive Neuroscience, Neuro-X Institute and Brain Mind Institute, Faculty of Life Sciences, École polytechnique fédérale de Lausanne (EPFL), 1202 Geneva, Switzerland
| | - Oliver Alan Kannape
- Laboratory of Cognitive Neuroscience, Neuro-X Institute and Brain Mind Institute, Faculty of Life Sciences, École polytechnique fédérale de Lausanne (EPFL), 1202 Geneva, Switzerland
- Virtual Medicine Centre, NeuroCentre, University Hospital of Geneva, 1205 Geneva, Switzerland
| | - Nathan Faivre
- University Grenoble Alpes, University Savoie Mont Blanc, CNRS, LPNC, 38000 Grenoble, France
| | - Olaf Blanke
- Laboratory of Cognitive Neuroscience, Neuro-X Institute and Brain Mind Institute, Faculty of Life Sciences, École polytechnique fédérale de Lausanne (EPFL), 1202 Geneva, Switzerland
- Department of Clinical Neurosciences, University Hospital of Geneva, 1205 Geneva, Switzerland
| |
Collapse
|
13
|
Smith HMJ, Roeser J, Pautz N, Davis JP, Robson J, Wright D, Braber N, Stacey PC. Evaluating earwitness identification procedures: adapting pre-parade instructions and parade procedure. Memory 2023; 31:147-161. [PMID: 36201314 DOI: 10.1080/09658211.2022.2129065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]
Abstract
Voice identification parades can be unreliable, as earwitness responses are error-prone. In this paper we tested performance across serial and sequential procedures, and varied pre-parade instructions, with the aim of reducing errors. The participants heard a target voice and later attempted to identify it from a parade. In Experiment 1 they were either warned that the target may or may not be present (standard warning) or encouraged to consider responding "not present" because of the associated risk of a wrongful conviction (strong warning). Strong warnings prompted a conservative criterion shift, with participants less likely to make a positive identification regardless of whether the target was present. In contrast to previous findings, we found no statistically reliable difference in accuracy between serial and sequential parades. Experiment 2 ruled out a potential confound in Experiment 1. Taken together, our results suggest that adapting pre-parade instructions provides a simple way of reducing the risk of false identifications.
Collapse
Affiliation(s)
- Harriet M J Smith
- Department of Psychology, Nottingham Trent University, Nottingham, United Kingdom
| | - Jens Roeser
- Department of Psychology, Nottingham Trent University, Nottingham, United Kingdom
| | - Nikolas Pautz
- Department of Psychology, Nottingham Trent University, Nottingham, United Kingdom
| | - Josh P Davis
- School of Human Sciences, University of Greenwich, London, United Kingdom
| | - Jeremy Robson
- Leicester De Montfort Law School, De Montfort University, Leicester, United Kingdom
| | - David Wright
- English, Communications and Philosophy, Nottingham Trent University, Nottingham, United Kingdom
| | - Natalie Braber
- English, Communications and Philosophy, Nottingham Trent University, Nottingham, United Kingdom
| | - Paula C Stacey
- Department of Psychology, Nottingham Trent University, Nottingham, United Kingdom
| |
Collapse
|
14
|
Rinke P, Schmidt T, Beier K, Kaul R, Scharinger M. Rapid pre-attentive processing of a famous speaker: Electrophysiological effects of Angela Merkel's voice. Neuropsychologia 2022; 173:108312. [PMID: 35781011 DOI: 10.1016/j.neuropsychologia.2022.108312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 06/27/2022] [Accepted: 06/27/2022] [Indexed: 11/18/2022]
Abstract
The recognition of human speakers by their voices is a remarkable cognitive ability. Previous research has established a voice area in the right temporal cortex involved in the integration of speaker-specific acoustic features. This integration appears to occur rapidly, especially in case of familiar voices. However, the exact time course of this process is less well understood. To this end, we here investigated the automatic change detection response of the human brain while listening to the famous voice of German chancellor Angela Merkel, embedded in the context of acoustically matched voices. A classic passive oddball paradigm contrasted short word stimuli uttered by Merkel with word stimuli uttered by two unfamiliar female speakers. Electrophysiological voice processing indices from 21 participants were quantified as mismatch negativities (MMNs) and P3a differences. Cortical sources were approximated by variable resolution electromagnetic tomography. The results showed amplitude and latency effects for both MMN and P3a: The famous (familiar) voice elicited a smaller but earlier MMN than the unfamiliar voices. The P3a, by contrast, was both larger and later for the familiar than for the unfamiliar voices. Familiar-voice MMNs originated from right-hemispheric regions in temporal cortex, overlapping with the temporal voice area, while unfamiliar-voice MMNs stemmed from left superior temporal gyrus. These results suggest that the processing of a very famous voice relies on pre-attentive right temporal processing within the first 150 ms of the acoustic signal. The findings further our understanding of the neural dynamics underlying familiar voice processing.
Collapse
Affiliation(s)
- Paula Rinke
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany; Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Germany
| | - Tatjana Schmidt
- Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Germany; Faculté de biologie et de médecine, University of Lausanne, Switzerland
| | - Kjartan Beier
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany
| | - Ramona Kaul
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany
| | - Mathias Scharinger
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany; Research Center »Deutscher Sprachatlas«, Philipps-University Marburg, Germany; Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Germany.
| |
Collapse
|
15
|
Orena AJ, Mader AS, Werker JF. Learning to Recognize Unfamiliar Voices: An Online Study With 12- and 24-Month-Olds. Front Psychol 2022; 13:874411. [PMID: 35558718 PMCID: PMC9088808 DOI: 10.3389/fpsyg.2022.874411] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Accepted: 03/18/2022] [Indexed: 12/02/2022] Open
Abstract
Young infants are attuned to the indexical properties of speech: they can recognize highly familiar voices and distinguish them from unfamiliar voices. Less is known about how and when infants start to recognize unfamiliar voices, and to map them to faces. This skill is particularly challenging when portions of the speaker’s face are occluded, as is the case with masking. Here, we examined voice−face recognition abilities in infants 12 and 24 months of age. Using the online Lookit platform, children saw and heard four different speakers produce words with sonorous phonemes (high talker information), and words with phonemes that are less sonorous (low talker information). Infants aged 24 months, but not 12 months, were able to learn to link the voices to partially occluded faces of unfamiliar speakers, and only when the words were produced with high talker information. These results reveal that 24-month-old infants can encode and retrieve indexical properties of an unfamiliar speaker’s voice, and they can access this information even when visual access to the speaker’s mouth is blocked.
Collapse
Affiliation(s)
- Adriel John Orena
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada.,Department of Evaluation and Research Services, Fraser Health Authority, Surrey, BC, Canada
| | - Asia Sotera Mader
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | - Janet F Werker
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
16
|
Schroeger A, Kaufmann JM, Zäske R, Kovács G, Klos T, Schweinberger SR. Atypical prosopagnosia following right hemispheric stroke: A 23-year follow-up study with M.T. Cogn Neuropsychol 2022; 39:196-207. [PMID: 36202621 DOI: 10.1080/02643294.2022.2119838] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Most findings on prosopagnosia to date suggest preserved voice recognition in prosopagnosia (except in cases with bilateral lesions). Here we report a follow-up examination on M.T., suffering from acquired prosopagnosia following a large unilateral right-hemispheric lesion in frontal, parietal, and anterior temporal areas excluding core ventral occipitotemporal face areas. Twenty-three years after initial testing we reassessed face and object recognition skills [Henke, K., Schweinberger, S. R., Grigo, A., Klos, T., & Sommer, W. (1998). Specificity of face recognition: Recognition of exemplars of non-face objects in prosopagnosia. Cortex, 34(2), 289-296]; [Schweinberger, S. R., Klos, T., & Sommer, W. (1995). Covert face recognition in prosopagnosia - A dissociable function? Cortex, 31(3), 517-529] and additionally studied voice recognition. Confirming the persistence of deficits, M.T. exhibited substantial impairments in famous face recognition and memory for learned faces, but preserved face matching and object recognition skills. Critically, he showed substantially impaired voice recognition skills. These findings are congruent with the ideas that (i) prosopagnosia after right anterior temporal lesions can persist over long periods > 20 years, and that (ii) such lesions can be associated with both facial and vocal deficits in person recognition.
Collapse
Affiliation(s)
- Anna Schroeger
- Department of Psychology, Faculty of Psychology and Sports Science, Justus Liebig University, Giessen, Germany.,Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University, Jena, Germany.,Department for the Psychology of Human Movement and Sport, Institute of Sport Science, Friedrich Schiller University, Jena, Germany
| | - Jürgen M Kaufmann
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University, Jena, Germany.,DFG Research Unit Person Perception, Friedrich Schiller University, Jena, Germany
| | - Romi Zäske
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University, Jena, Germany.,DFG Research Unit Person Perception, Friedrich Schiller University, Jena, Germany
| | - Gyula Kovács
- DFG Research Unit Person Perception, Friedrich Schiller University, Jena, Germany.,Biological Psychology and Cognitive Neurosciences, Institute of Psychology, Friedrich Schiller University, Jena, Germany
| | | | - Stefan R Schweinberger
- Department for General Psychology and Cognitive Neuroscience, Institute of Psychology, Friedrich Schiller University, Jena, Germany.,DFG Research Unit Person Perception, Friedrich Schiller University, Jena, Germany
| |
Collapse
|
17
|
Orepic P, Park HD, Rognini G, Faivre N, Blanke O. Breathing affects self-other voice discrimination in a bodily state associated with somatic passivity. Psychophysiology 2022; 59:e14016. [PMID: 35150452 PMCID: PMC9286416 DOI: 10.1111/psyp.14016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 07/08/2021] [Accepted: 12/24/2021] [Indexed: 01/03/2023]
Abstract
A growing number of studies have focused on identifying cognitive processes that are modulated by interoceptive signals, particularly in relation to the respiratory or cardiac cycle. Considering the fundamental role of interoception in bodily self‐consciousness, we here investigated whether interoceptive signals also impact self‐voice perception. We applied an interactive, robotic paradigm associated with somatic passivity (a bodily state characterized by illusory misattribution of self‐generated touches to someone else) to investigate whether somatic passivity impacts self‐voice perception as a function of concurrent interoceptive signals. Participants' breathing and heartbeat signals were recorded while they performed two self‐voice tasks (self‐other voice discrimination and loudness perception) and while simultaneously experiencing two robotic conditions (somatic passivity condition; control condition). Our data reveal that respiration, but not cardiac activity, affects self‐voice perception: participants were better at discriminating self‐voice from another person’s voice during the inspiration phase of the respiration cycle. Moreover, breathing effects were prominent in participants experiencing somatic passivity and a different task with the same stimuli (i.e., judging the loudness and not identity of the voices) was unaffected by breathing. Combining interoception and voice perception with self‐monitoring framework, these data extend findings on breathing‐dependent changes in perception and cognition to self‐related processing. Impact StatementThe contents of this page will be shown on the eTOC on the online version only. It will not be published as part of the article PDF. We combined psychophysics with robotics and voice‐morphing technology to evaluate the effect of breathing on self‐voice perception. Our results show that listeners better perceive their own voice during inspiration, an effect that is modulated by self‐related bodily processing. This extends previous findings documenting the effect of interoceptive signals on perception and suggests that the bodily self may serve as a scaffold for cognition.
Collapse
Affiliation(s)
- Pavo Orepic
- Laboratory of Cognitive Neuroscience, Center for Neuroprosthetics and Brain Mind Institute, Faculty of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Hyeong-Dong Park
- Graduate Institute of Mind, Brain and Consciousness, Taipei Medical University, Taipei, Taiwan.,Brain and Consciousness Research Centre, Shuang-Ho Hospital, New Taipei City, Taiwan
| | - Giulio Rognini
- Laboratory of Cognitive Neuroscience, Center for Neuroprosthetics and Brain Mind Institute, Faculty of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Nathan Faivre
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LPNC, Grenoble, France
| | - Olaf Blanke
- Laboratory of Cognitive Neuroscience, Center for Neuroprosthetics and Brain Mind Institute, Faculty of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| |
Collapse
|
18
|
Gábor A, Andics A, Miklósi Á, Czeibert K, Carreiro C, Gácsi M. Social relationship-dependent neural response to speech in dogs. Neuroimage 2021; 243:118480. [PMID: 34411741 DOI: 10.1016/j.neuroimage.2021.118480] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 07/13/2021] [Accepted: 08/15/2021] [Indexed: 11/16/2022] Open
Abstract
In humans, social relationship with the speaker affects neural processing of speech, as exemplified by children's auditory and reward responses to their mother's utterances. Family dogs show human analogue attachment behavior towards the owner, and neuroimaging revealed auditory cortex and reward center sensitivity to verbal praises in dog brains. Combining behavioral and non-invasive fMRI data, we investigated the effect of dogs' social relationship with the speaker on speech processing. Dogs listened to praising and neutral speech from their owners and a control person. We found positive correlation between dogs' behaviorally measured attachment scores towards their owners and neural activity increase for the owner's voice in the caudate nucleus; and activity increase in the secondary auditory caudal ectosylvian gyrus and the caudate nucleus for the owner's praise. Through identifying social relationship-dependent neural reward responses, our study reveals similarities in neural mechanisms modulated by infant-mother and dog-owner attachment.
Collapse
Affiliation(s)
- Anna Gábor
- MTA-ELTE 'Lendület' Neuroethology of Communication Research Group, Hungarian Academy of Sciences - Eötvös Loránd University, H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary; Department of Ethology, Eötvös Loránd University, H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary.
| | - Attila Andics
- MTA-ELTE 'Lendület' Neuroethology of Communication Research Group, Hungarian Academy of Sciences - Eötvös Loránd University, H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary; Department of Ethology, Eötvös Loránd University, H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary
| | - Ádám Miklósi
- Department of Ethology, Eötvös Loránd University, H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary; MTA-ELTE Comparative Ethology Research Group, H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary
| | - Kálmán Czeibert
- Department of Ethology, Eötvös Loránd University, H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary
| | - Cecília Carreiro
- Department of Ethology, Eötvös Loránd University, H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary
| | - Márta Gácsi
- Department of Ethology, Eötvös Loránd University, H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary; MTA-ELTE Comparative Ethology Research Group, H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary
| |
Collapse
|
19
|
Familiarity and task context shape the use of acoustic information in voice identity perception. Cognition 2021; 215:104780. [PMID: 34298232 PMCID: PMC8381763 DOI: 10.1016/j.cognition.2021.104780] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 05/10/2021] [Accepted: 05/12/2021] [Indexed: 11/23/2022]
Abstract
Familiar and unfamiliar voice perception are often understood as being distinct from each other. For identity perception, theoretical work has proposed that listeners use acoustic information in different ways to perceive identity from familiar and unfamiliar voices: Unfamiliar voices are thought to be processed based on close comparisons of acoustic properties, while familiar voices are processed based on diagnostic acoustic features that activate a stored person-specific representation of that voice. To date no empirical study has directly examined whether and how familiar and unfamiliar listeners differ in their use of acoustic information for identity perception. Here, we tested this theoretical claim by linking listeners' judgements in voice identity tasks to complex acoustic representation - spectral similarity of the heard voice recordings. Participants (N = 177) who were either familiar or unfamiliar with a set of voices completed an identity discrimination task (Experiment 1) or an identity sorting task (Experiment 2). In both experiments, identity judgements for familiar and unfamiliar voices were guided by spectral similarity: Pairs of recordings with greater acoustic similarity were more likely to be perceived as belonging to the same voice identity. However, while there were no differences in how familiar and unfamiliar listeners used acoustic information for identity discrimination, differences were apparent for identity sorting. Our study therefore challenges proposals that view familiar and unfamiliar voice perception as being at all times distinct. Instead, our data suggest a critical role of the listening situation in which familiar and unfamiliar voices are evaluated, thus characterising voice identity perception as a highly dynamic process in which listeners opportunistically make use of any kind of information they can access.
Collapse
|
20
|
Maguinness C, von Kriegstein K. Visual mechanisms for voice-identity recognition flexibly adjust to auditory noise level. Hum Brain Mapp 2021; 42:3963-3982. [PMID: 34043249 PMCID: PMC8288083 DOI: 10.1002/hbm.25532] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 04/26/2021] [Accepted: 05/02/2021] [Indexed: 11/24/2022] Open
Abstract
Recognising the identity of voices is a key ingredient of communication. Visual mechanisms support this ability: recognition is better for voices previously learned with their corresponding face (compared to a control condition). This so‐called ‘face‐benefit’ is supported by the fusiform face area (FFA), a region sensitive to facial form and identity. Behavioural findings indicate that the face‐benefit increases in noisy listening conditions. The neural mechanisms for this increase are unknown. Here, using functional magnetic resonance imaging, we examined responses in face‐sensitive regions while participants recognised the identity of auditory‐only speakers (previously learned by face) in high (SNR −4 dB) and low (SNR +4 dB) levels of auditory noise. We observed a face‐benefit in both noise levels, for most participants (16 of 21). In high‐noise, the recognition of face‐learned speakers engaged the right posterior superior temporal sulcus motion‐sensitive face area (pSTS‐mFA), a region implicated in the processing of dynamic facial cues. The face‐benefit in high‐noise also correlated positively with increased functional connectivity between this region and voice‐sensitive regions in the temporal lobe in the group of 16 participants with a behavioural face‐benefit. In low‐noise, the face‐benefit was robustly associated with increased responses in the FFA and to a lesser extent the right pSTS‐mFA. The findings highlight the remarkably adaptive nature of the visual network supporting voice‐identity recognition in auditory‐only listening conditions.
Collapse
Affiliation(s)
- Corrina Maguinness
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
21
|
The processing of intimately familiar and unfamiliar voices: Specific neural responses of speaker recognition and identification. PLoS One 2021; 16:e0250214. [PMID: 33861789 PMCID: PMC8051806 DOI: 10.1371/journal.pone.0250214] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 04/03/2021] [Indexed: 11/19/2022] Open
Abstract
Research has repeatedly shown that familiar and unfamiliar voices elicit different neural responses. But it has also been suggested that different neural correlates associate with the feeling of having heard a voice and knowing who the voice represents. The terminology used to designate these varying responses remains vague, creating a degree of confusion in the literature. Additionally, terms serving to designate tasks of voice discrimination, voice recognition, and speaker identification are often inconsistent creating further ambiguities. The present study used event-related potentials (ERPs) to clarify the difference between responses to 1) unknown voices, 2) trained-to-familiar voices as speech stimuli are repeatedly presented, and 3) intimately familiar voices. In an experiment, 13 participants listened to repeated utterances recorded from 12 speakers. Only one of the 12 voices was intimately familiar to a participant, whereas the remaining 11 voices were unfamiliar. The frequency of presentation of these 11 unfamiliar voices varied with only one being frequently presented (the trained-to-familiar voice). ERP analyses revealed different responses for intimately familiar and unfamiliar voices in two distinct time windows (P2 between 200-250 ms and a late positive component, LPC, between 450-850 ms post-onset) with late responses occurring only for intimately familiar voices. The LPC present sustained shifts, and short-time ERP components appear to reflect an early recognition stage. The trained voice equally elicited distinct responses, compared to rarely heard voices, but these occurred in a third time window (N250 between 300-350 ms post-onset). Overall, the timing of responses suggests that the processing of intimately familiar voices operates in two distinct steps of voice recognition, marked by a P2 on right centro-frontal sites, and speaker identification marked by an LPC component. The recognition of frequently heard voices entails an independent recognition process marked by a differential N250. Based on the present results and previous observations, it is proposed that there is a need to distinguish between processes of voice "recognition" and "identification". The present study also specifies test conditions serving to reveal this distinction in neural responses, one of which bears on the length of speech stimuli given the late responses associated with voice identification.
Collapse
|
22
|
Jenkins RE, Tsermentseli S, Monks CP, Robertson DJ, Stevenage SV, Symons AE, Davis JP. Are super‐face‐recognisers also super‐voice‐recognisers? Evidence from cross‐modal identification tasks. APPLIED COGNITIVE PSYCHOLOGY 2021. [DOI: 10.1002/acp.3813] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Ryan E. Jenkins
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - Stella Tsermentseli
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - Claire P. Monks
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| | - David J. Robertson
- School of Psychological Sciences and Health University of Strathclyde Glasgow UK
| | | | - Ashley E. Symons
- Department of Psychology University of Southampton Southampton UK
| | - Josh P. Davis
- School of Human Sciences, Institute for Lifecourse Development University of Greenwich London UK
| |
Collapse
|
23
|
Borowiak K, von Kriegstein K. Intranasal oxytocin modulates brain responses to voice-identity recognition in typically developing individuals, but not in ASD. Transl Psychiatry 2020; 10:221. [PMID: 32636360 PMCID: PMC7341857 DOI: 10.1038/s41398-020-00903-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 06/05/2020] [Accepted: 06/08/2020] [Indexed: 11/09/2022] Open
Abstract
Faces and voices are prominent cues for person-identity recognition. Face recognition behavior and associated brain responses can be enhanced by intranasal administration of oxytocin. It is unknown whether oxytocin can also augment voice-identity recognition mechanisms. To find it out is particularly relevant for individuals who have difficulties recognizing voice identity such as individuals diagnosed with autism spectrum disorder (ASD). We conducted a combined behavioral and functional magnetic resonance imaging (fMRI) study to investigate voice-identity recognition following intranasal administration of oxytocin or placebo in a group of adults diagnosed with ASD (full-scale intelligence quotient > 85) and pairwise-matched typically developing (TD) controls. A single dose of 24 IU oxytocin was administered in a randomized, double-blind, placebo-controlled and cross-over design. In the control group, but not in the ASD group, administration of oxytocin compared to placebo increased responses to recognition of voice identity in contrast to speech in the right posterior superior temporal sulcus/gyrus (pSTS/G) - a region implicated in the perceptual analysis of voice-identity information. In the ASD group, the right pSTS/G responses were positively correlated with voice-identity recognition accuracy in the oxytocin condition, but not in the placebo condition. Oxytocin did not improve voice-identity recognition performance at the group level. The ASD compared to the control group had lower right pSTS/G responses to voice-identity recognition. Since ASD is known to have atypical pSTS/G, the results indicate that the potential of intranasal oxytocin to enhance mechanisms for voice-identity recognition might be variable and dependent on the functional integrity of this brain region.
Collapse
Affiliation(s)
- Kamila Borowiak
- Technische Universität Dresden, Bamberger Straße 7, 01187, Dresden, Germany.
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103, Leipzig, Germany.
- Berlin School of Mind and Brain, Humboldt University of Berlin, Luisenstraße 56, 10117, Berlin, Germany.
| | - Katharina von Kriegstein
- Technische Universität Dresden, Bamberger Straße 7, 01187, Dresden, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103, Leipzig, Germany
| |
Collapse
|
24
|
Cooper A, Fecher N, Johnson EK. Identifying children's voices. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:324. [PMID: 32752764 DOI: 10.1121/10.0001576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 06/26/2020] [Indexed: 06/11/2023]
Abstract
Human adults rely on both acoustic and linguistic information to identify adult talkers. Assuming favorable conditions, adult listeners recognize other adults fairly accurately and quickly. But how well can adult listeners recognize child talkers, whose speech productions often differ dramatically from adult speech productions? Although adult talker recognition has been heavily studied, only one study to date has directly compared the recognition of unfamiliar adult and child talkers [Creel and Jimenez (2012). J. Exp. Child Psychol. 113(4), 487-509]. Therefore, the current study revisits this question with a much larger and younger sample of child talkers (N = 20); performance with adult talkers (N = 20) was also tested to provide a baseline. In Experiment 1, adults successfully distinguished between adult talkers in an AX discrimination task but performed much worse with child talkers. In Experiment 2, adults were slower and less accurate at learning to identify child talkers than adult talkers in a training-identification task. Finally, in Experiment 3, adults failed to improve at identifying child talkers after three days of training with numerous child voices. Taken together, these findings reveal a sizable difference in adults' ability to recognize child versus adult talkers. Possible explanations and implications for understanding human talker recognition are discussed.
Collapse
Affiliation(s)
- Angela Cooper
- Department of Psychology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, Ontario L5L 1C6, Canada
| | - Natalie Fecher
- Department of Psychology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, Ontario L5L 1C6, Canada
| | - Elizabeth K Johnson
- Department of Psychology, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, Ontario L5L 1C6, Canada
| |
Collapse
|
25
|
Shimrock S, Ferrand C. Listener Perceptions of Women With Voice Disorders: Vocal Stereotyping and Negative Personality Attribution. J Voice 2020; 35:934.e1-934.e6. [PMID: 32299637 DOI: 10.1016/j.jvoice.2020.02.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 02/19/2020] [Accepted: 02/20/2020] [Indexed: 10/24/2022]
Abstract
PURPOSE The purpose of the study was to determine if the age and amount of background knowledge of listeners affects perceptual judgments of women with voice disorders. METHOD Forty participants in three different age groups (children, young adults, and older adults) rated five female voice samples representing various types of dysphonia. One group of young adults had background knowledge of voice disorders based on a graduate-level course in Voice Disorders. A semantic differential scale was used to rate the speakers on 24 attributes. RESULTS Results indicated that age of listeners was not a significant factor, and that listeners' ratings depended on the specific type of dysphonia. No significant differences emerged between the perceptions of individuals with and without background knowledge of voice disorders. DISCUSSION This study agrees with the findings of similar research showing that listeners judge speakers with voice disorders more negatively than they do those with normal voices. This is so regardless of the age and background knowledge of the listener.
Collapse
Affiliation(s)
- Shannon Shimrock
- Encompass Health Rehabilitation Hospital of Western Massachusetts, Ludlow, Massachusetts
| | - Carole Ferrand
- Department of Speech-Language-Hearing Sciences, Hofstra University, Hempstead, New York.
| |
Collapse
|
26
|
Stevenage SV, Symons AE, Fletcher A, Coen C. Sorting through the impact of familiarity when processing vocal identity: Results from a voice sorting task. Q J Exp Psychol (Hove) 2019; 73:519-536. [PMID: 31658884 PMCID: PMC7074657 DOI: 10.1177/1747021819888064] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The present article reports on one experiment designed to examine the importance of familiarity when processing vocal identity. A voice sorting task was used with participants who were either personally familiar or unfamiliar with three speakers. The results suggested that familiarity supported both an ability to tell different instances of the same voice together, and to tell similar instances of different voices apart. In addition, the results suggested differences between the three speakers in terms of the extent to which they were confusable, underlining the importance of vocal characteristics and stimulus selection within behavioural tasks. The results are discussed with reference to existing debates regarding the nature of stored representations as familiarity develops, and the difficulty when processing voices over faces more generally.
Collapse
Affiliation(s)
| | - Ashley E Symons
- School of Psychology, University of Southampton, Southampton, UK
| | - Abi Fletcher
- School of Psychology, University of Southampton, Southampton, UK
| | - Chantelle Coen
- School of Psychology, University of Southampton, Southampton, UK
| |
Collapse
|
27
|
Kim Y, Sidtis JJ, Van Lancker Sidtis D. Emotionally expressed voices are retained in memory following a single exposure. PLoS One 2019; 14:e0223948. [PMID: 31622405 PMCID: PMC6797471 DOI: 10.1371/journal.pone.0223948] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 10/02/2019] [Indexed: 11/18/2022] Open
Abstract
Studies of voice recognition in biology suggest that long exposure may not satisfactorily represent the voice acquisition process. The current study proposes that humans can acquire a newly familiar voice from brief exposure to spontaneous speech, given a personally engaging context. Studies have shown that arousing and emotionally engaging experiences are more likely to be recorded and consolidated in memory. Yet it remains undemonstrated whether this advantage holds for voices. The present study examined the role of emotionally expressive context in the acquisition of voices following a single, 1-minute exposure by comparing recognition of voices experienced in engaging and neutral contexts at two retention intervals. Listeners were exposed to a series of emotionally nuanced and neutral videotaped narratives produced by performers, and tested on the recognition of excerpted voice samples, by indicating whether they had heard the voice before, immediately and after a one-week delay. Excerpts were voices from exposed videotaped narratives, but utilized verbal material taken from a second (nonexposed) narrative provided by the same performer. Overall, participants were consistently able to distinguish between voices that were exposed during the video session and voices that were not exposed. Voices experienced in emotional, engaging contexts were significantly better recognized than those in neutral ones both immediately and after a one-week delay. Our findings provide the first evidence that new voices can be acquired rapidly from one-time exposure and that nuanced context facilitates initially inducting new voices into a repertory of personally familiar voices in long-term memory. The results converge with neurological evidence to suggest that cerebral processes differ for familiar and unfamiliar voices.
Collapse
Affiliation(s)
- Yoonji Kim
- Department of Communicative Sciences and Disorders, New York University, New York, NY, United States of America
- The Nathan Kline Institute for Psychiatric Research at Rockland Psychiatric Center, Geriatrics Division, New York, NY, United States of America
- * E-mail:
| | - John J. Sidtis
- The Nathan Kline Institute for Psychiatric Research at Rockland Psychiatric Center, Geriatrics Division, New York, NY, United States of America
- Department of Psychiatry, New York University Langone School of Medicine, New York, NY, United States of America
| | - Diana Van Lancker Sidtis
- Department of Communicative Sciences and Disorders, New York University, New York, NY, United States of America
- The Nathan Kline Institute for Psychiatric Research at Rockland Psychiatric Center, Geriatrics Division, New York, NY, United States of America
| |
Collapse
|
28
|
Voices to remember: Comparing neural signatures of intentional and non-intentional voice learning and recognition. Brain Res 2019; 1711:214-225. [PMID: 30685271 DOI: 10.1016/j.brainres.2019.01.028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Revised: 01/17/2019] [Accepted: 01/23/2019] [Indexed: 11/21/2022]
Abstract
Recent electrophysiological evidence suggests a rapid acquisition of novel speaker representations during intentional voice learning. We investigated effects of learning intention on voice recognition, using a variant of the directed forgetting paradigm. In an old/new recognition task following voice learning, we compared performance and event-related brain potentials (ERPs) for studied voices, half of which had been prompted to be remembered (TBR) or forgotten (TBF). Furthermore, to assess incidental encoding of episodic information, participants indicated for each recognized test voice the ear of presentation during study. During study, TBR voices elicited more positive ERPs than TBF voices (from ∼250 ms), possibly reflecting deeper voice encoding. In parallel, subsequent recognition performance was higher for TBR than for TBF voices. Importantly, above-chance recognition for both learning conditions nevertheless suggested a degree of non-intentional voice learning. In a surprise episodic memory test for voice location, above-chance performance was observed for TBR voices only, suggesting that episodic memory for ear of presentation depended on intentional voice encoding. At test, a left posterior ERP OLD/NEW effect for both TBR and TBF voices (from ∼500 ms) reflected recognition of studied voices under both encoding conditions. By contrast, a right frontal ERP OLD/NEW effect for TBF voices only (from ∼800 ms) possibly reflected additional elaborative retrieval processes. Overall, we show that ERPs are sensitive 1) to strategic voice encoding during study (from ∼250 ms), and 2) to voice recognition at test (from ∼500 ms), with the specific pattern of ERP OLD/NEW effects partly depending on previous encoding intention.
Collapse
|
29
|
Fecher N, Paquette‐Smith M, Johnson EK. Resolving the (Apparent) Talker Recognition Paradox in Developmental Speech Perception. INFANCY 2019; 24:570-588. [DOI: 10.1111/infa.12290] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 03/05/2019] [Accepted: 03/11/2019] [Indexed: 11/30/2022]
Affiliation(s)
- Natalie Fecher
- Department of Psychology University of Toronto Mississauga
| | | | | |
Collapse
|
30
|
Aglieri V, Chaminade T, Takerkart S, Belin P. Functional connectivity within the voice perception network and its behavioural relevance. Neuroimage 2018; 183:356-365. [PMID: 30099078 PMCID: PMC6215333 DOI: 10.1016/j.neuroimage.2018.08.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 07/13/2018] [Accepted: 08/08/2018] [Indexed: 12/13/2022] Open
Abstract
Recognizing who is speaking is a cognitive ability characterized by considerable individual differences, which could relate to the inter-individual variability observed in voice-elicited BOLD activity. Since voice perception is sustained by a complex brain network involving temporal voice areas (TVAs) and, even if less consistently, extra-temporal regions such as frontal cortices, functional connectivity (FC) during an fMRI voice localizer (passive listening of voices vs non-voices) has been computed within twelve temporal and frontal voice-sensitive regions (“voice patches”) individually defined for each subject (N = 90) to account for inter-individual variability. Results revealed that voice patches were positively co-activated during voice listening and that they were characterized by different FC pattern depending on the location (anterior/posterior) and the hemisphere. Importantly, FC between right frontal and temporal voice patches was behaviorally relevant: FC significantly increased with voice recognition abilities as measured in a voice recognition test performed outside the scanner. Hence, this study highlights the importance of frontal regions in voice perception and it supports the idea that looking at FC between stimulus-specific and higher-order frontal regions can help understanding individual differences in processing social stimuli such as voices.
Collapse
Affiliation(s)
- Virginia Aglieri
- Institut des Neurosciences de la Timone, UMR 7289, CNRS and Université Aix-Marseille, Marseille, France.
| | - Thierry Chaminade
- Institut des Neurosciences de la Timone, UMR 7289, CNRS and Université Aix-Marseille, Marseille, France; Institute of Language, Communication and the Brain, Marseille, France
| | - Sylvain Takerkart
- Institut des Neurosciences de la Timone, UMR 7289, CNRS and Université Aix-Marseille, Marseille, France; Institute of Language, Communication and the Brain, Marseille, France
| | - Pascal Belin
- Institut des Neurosciences de la Timone, UMR 7289, CNRS and Université Aix-Marseille, Marseille, France; Institute of Language, Communication and the Brain, Marseille, France; International Laboratories for Brain, Music and Sound, Department of Psychology, Université de Montréal, McGill University, Montreal, QC, Canada
| |
Collapse
|