1
|
Bürgel M, Siedenburg K. Impact of interference on vocal and instrument recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:922-938. [PMID: 39133041 DOI: 10.1121/10.0028152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 07/12/2024] [Indexed: 08/13/2024]
Abstract
Voices arguably occupy a superior role in auditory processing. Specifically, studies have reported that singing voices are processed faster and more accurately and possess greater salience in musical scenes compared to instrumental sounds. However, the underlying acoustic features of this superiority and the generality of these effects remain unclear. This study investigates the impact of frequency micro-modulations (FMM) and the influence of interfering sounds on sound recognition. Thirty young participants, half with musical training, engage in three sound recognition experiments featuring short vocal and instrumental sounds in a go/no-go task. Accuracy and reaction times are measured for sounds from recorded samples and excerpts of popular music. Each sound is presented in separate versions with and without FMM, in isolation or accompanied by a piano. Recognition varies across sound categories, but no general vocal superiority emerges and no effects of FMM. When presented together with interfering sounds, all sounds exhibit degradation in recognition. However, whereas /a/ sounds stand out by showing a distinct robustness to interference (i.e., less degradation of recognition), /u/ sounds lack this robustness. Acoustical analysis implies that recognition differences can be explained by spectral similarities. Together, these results challenge the notion of general vocal superiority in auditory perception.
Collapse
Affiliation(s)
- Michel Bürgel
- Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg 26129, Germany
| | - Kai Siedenburg
- Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg 26129, Germany
- Signal Processing and Speech Communication Laboratory, Graz University of Technology, Graz 8010, Austria
| |
Collapse
|
2
|
The processing of intimately familiar and unfamiliar voices: Specific neural responses of speaker recognition and identification. PLoS One 2021; 16:e0250214. [PMID: 33861789 PMCID: PMC8051806 DOI: 10.1371/journal.pone.0250214] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 04/03/2021] [Indexed: 11/19/2022] Open
Abstract
Research has repeatedly shown that familiar and unfamiliar voices elicit different neural responses. But it has also been suggested that different neural correlates associate with the feeling of having heard a voice and knowing who the voice represents. The terminology used to designate these varying responses remains vague, creating a degree of confusion in the literature. Additionally, terms serving to designate tasks of voice discrimination, voice recognition, and speaker identification are often inconsistent creating further ambiguities. The present study used event-related potentials (ERPs) to clarify the difference between responses to 1) unknown voices, 2) trained-to-familiar voices as speech stimuli are repeatedly presented, and 3) intimately familiar voices. In an experiment, 13 participants listened to repeated utterances recorded from 12 speakers. Only one of the 12 voices was intimately familiar to a participant, whereas the remaining 11 voices were unfamiliar. The frequency of presentation of these 11 unfamiliar voices varied with only one being frequently presented (the trained-to-familiar voice). ERP analyses revealed different responses for intimately familiar and unfamiliar voices in two distinct time windows (P2 between 200-250 ms and a late positive component, LPC, between 450-850 ms post-onset) with late responses occurring only for intimately familiar voices. The LPC present sustained shifts, and short-time ERP components appear to reflect an early recognition stage. The trained voice equally elicited distinct responses, compared to rarely heard voices, but these occurred in a third time window (N250 between 300-350 ms post-onset). Overall, the timing of responses suggests that the processing of intimately familiar voices operates in two distinct steps of voice recognition, marked by a P2 on right centro-frontal sites, and speaker identification marked by an LPC component. The recognition of frequently heard voices entails an independent recognition process marked by a differential N250. Based on the present results and previous observations, it is proposed that there is a need to distinguish between processes of voice "recognition" and "identification". The present study also specifies test conditions serving to reveal this distinction in neural responses, one of which bears on the length of speech stimuli given the late responses associated with voice identification.
Collapse
|
3
|
Stavropoulos KKM, Carver LJ. Neural Correlates of Attention to Human-Made Sounds: An ERP Study. PLoS One 2016; 11:e0165745. [PMID: 27798701 PMCID: PMC5087949 DOI: 10.1371/journal.pone.0165745] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2016] [Accepted: 10/16/2016] [Indexed: 11/23/2022] Open
Abstract
Previous neuroimaging and electrophysiological studies have suggested that human made sounds are processed differently from non-human made sounds. Multiple groups have suggested that voices might be processed as “special,” much like faces. Although previous literature has explored neural correlates of voice perception under varying task demands, few studies have examined electrophysiological correlates of attention while directly comparing human made and non-human made sounds. In the present study, we used event-related potentials (ERPs) to compare attention to human versus non-human made sounds in an oddball paradigm. ERP components of interest were the P300, and fronto-temporal positivity to voices (FTVP), which has been reported in previous investigations of voice versus non-voice stimuli. We found that participants who heard human made sounds as “target” or infrequent stimuli had significantly larger FTPV amplitude, shorter FTPV latency, and larger P300 amplitude than those who heard non-human sounds as “target” stimuli. Our results are in concordance with previous findings that human-made and non-human made sounds are processed differently, and expand upon previous literature by demonstrating increased attention to human versus non-human made sounds, even when the non-human made sounds are ones that require immediate attention in daily life (e.g. a car horn). Heightened attention to human-made sounds is important theoretically and has potential for application in tests of social interest in populations with autism.
Collapse
Affiliation(s)
| | - Leslie J. Carver
- University of California San Diego, San Diego, California, United States of America
| |
Collapse
|
4
|
Weis T, Estner B, Lachmann T. When speech enhances Spatial Musical Association of Response Codes: Joint spatial associations of pitch and timbre in nonmusicians. Q J Exp Psychol (Hove) 2016; 69:1687-700. [DOI: 10.1080/17470218.2015.1091850] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Previous studies have shown that the effect of the Spatial Musical Association of Response Codes (SMARC) depends on various features, such as task conditions (whether pitch height is implicit or explicit), response dimension (horizontal vs. vertical), presence or absence of a reference tone, and former musical training of the participants. In the present study, we investigated the effects of pitch range and timbre: in particular, how timbre (piano vs. vocal) contributes to the horizontal and vertical SMARC effect in nonmusicians under varied pitch range conditions. Nonmusicians performed a timbre judgement task in which the pitch range was either small (6 or 8 semitone steps) or large (9 or 12 semitone steps) in a horizontal and a vertical response setting. For piano sounds, SMARC effects were observed in all conditions. For the vocal sounds, in contrast, SMARC effects depended on pitch range. We concluded that the occurrence of the SMARC effect, especially in horizontal response settings, depends on the interaction of the timbre (vocal and piano) and pitch range if vocal and instrumental sounds are combined in one experiment: the human voice enhances the attention, both to the vocal and the instrumental sounds.
Collapse
Affiliation(s)
- Tina Weis
- Center for Cognitive Science, Cognitive and Developmental Psychology Unit, University of Kaiserslautern, Kaiserslautern, Germany
| | - Barbara Estner
- Center for Cognitive Science, Cognitive and Developmental Psychology Unit, University of Kaiserslautern, Kaiserslautern, Germany
| | - Thomas Lachmann
- Center for Cognitive Science, Cognitive and Developmental Psychology Unit, University of Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
5
|
Fang G, Yang P, Xue F, Cui J, Brauth SE, Tang Y. Sound Classification and Call Discrimination Are Decoded in Order as Revealed by Event-Related Potential Components in Frogs. BRAIN, BEHAVIOR AND EVOLUTION 2015; 86:232-45. [DOI: 10.1159/000441215] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 09/20/2015] [Indexed: 11/19/2022]
Abstract
Species that use communication sounds to coordinate social and reproductive behavior must be able to distinguish vocalizations from nonvocal sounds as well as to identify individual vocalization types. In this study we sought to identify the neural localization of the processes involved and the temporal order in which they occur in an anuran species, the music frog Babina daunchina. To do this we measured telencephalic and mesencephalic event-related potentials (ERPs) elicited by synthesized white noise (WN), highly sexually attractive (HSA) calls produced by males from inside nests and male calls of low sexual attractiveness (LSA) produced outside of nests. Each stimulus possessed similar temporal structures. The results showed the following: (1) the amplitudes of the first negative ERP component (N1) at ∼100 ms differed significantly between WN and conspecific calls but not between HSA and LSA calls, indicating that discrimination between conspecific calls and nonvocal sounds occurs in ∼100 ms, (2) the amplitudes of the second positive ERP component (P2) at ∼200 ms in the difference waves between HSA calls and WN were significantly higher than between LSA calls and WN in the right telencephalon, implying that call characteristic identification occurs in ∼200 ms and (3) WN evoked a larger third positive ERP component (P3) at ∼300 ms than conspecific calls, suggesting the frogs had classified the conspecific calls into one category and perceived WN as novel. Thus, both the detection of sounds and the identification of call characteristics are accomplished quickly in a specific temporal order, as reflected by ERP components. In addition, the most dynamic ERP patterns appeared in the left mesencephalon and the right telencephalon, indicating the two brain regions might play key roles in anuran vocal communication.
Collapse
|
6
|
Salvia E, Bestelmeyer PEG, Kotz SA, Rousselet GA, Pernet CR, Gross J, Belin P. Single-subject analyses of magnetoencephalographic evoked responses to the acoustic properties of affective non-verbal vocalizations. Front Neurosci 2014; 8:422. [PMID: 25565951 PMCID: PMC4273656 DOI: 10.3389/fnins.2014.00422] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Accepted: 12/02/2014] [Indexed: 11/13/2022] Open
Abstract
Magneto-encephalography (MEG) was used to examine the cerebral response to affective non-verbal vocalizations (ANVs) at the single-subject level. Stimuli consisted of non-verbal affect bursts from the Montreal Affective Voices morphed to parametrically vary acoustical structure and perceived emotional properties. Scalp magnetic fields were recorded in three participants while they performed a 3-alternative forced choice emotion categorization task (Anger, Fear, Pleasure). Each participant performed more than 6000 trials to allow single-subject level statistical analyses using a new toolbox which implements the general linear model (GLM) on stimulus-specific responses (LIMO-EEG). For each participant we estimated "simple" models [including just one affective regressor (Arousal or Valence)] as well as "combined" models (including acoustical regressors). Results from the "simple" models revealed in every participant the significant early effects (as early as ~100 ms after onset) of Valence and Arousal already reported at the group-level in previous work. However, the "combined" models showed that few effects of Arousal remained after removing the acoustically-explained variance, whereas significant effects of Valence remained especially at late stages. This study demonstrates (i) that single-subject analyses replicate the results observed at early stages by group-level studies and (ii) the feasibility of GLM-based analysis of MEG data. It also suggests that early modulation of MEG amplitude by affective stimuli partly reflects their acoustical properties.
Collapse
Affiliation(s)
- Emilie Salvia
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow Glasgow, UK
| | | | - Sonja A Kotz
- School of Psychological Sciences, University of Manchester Manchester, UK ; Max Planck Institute for Human Cognitive and Brain Sciences Leipzig, Germany
| | - Guillaume A Rousselet
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow Glasgow, UK
| | - Cyril R Pernet
- Brain Research Imaging Center, Division of Clinical Neurosciences, Western General Hospital, University of Edinburgh Edinburgh, UK
| | - Joachim Gross
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow Glasgow, UK
| | - Pascal Belin
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow Glasgow, UK ; Départment de Psychologie, Université de Montréal Montreal, Canada ; Institut des Neurosciences de La Timone, UMR 7289, CNRS & Aix-Marseille Université Marseille, France
| |
Collapse
|
7
|
Amemiya K, Karino S, Ishizu T, Yumoto M, Yamasoba T. Distinct neural mechanisms of tonal processing between musicians and non-musicians. Clin Neurophysiol 2014; 125:738-747. [DOI: 10.1016/j.clinph.2013.09.027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 09/01/2013] [Accepted: 09/05/2013] [Indexed: 11/25/2022]
|
8
|
Hutchins S, Moreno S. The Linked Dual Representation model of vocal perception and production. Front Psychol 2013; 4:825. [PMID: 24204360 PMCID: PMC3817506 DOI: 10.3389/fpsyg.2013.00825] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 10/17/2013] [Indexed: 11/13/2022] Open
Abstract
The voice is one of the most important media for communication, yet there is a wide range of abilities in both the perception and production of the voice. In this article, we review this range of abilities, focusing on pitch accuracy as a particularly informative case, and look at the factors underlying these abilities. Several classes of models have been posited describing the relationship between vocal perception and production, and we review the evidence for and against each class of model. We look at how the voice is different from other musical instruments and review evidence about both the association and the dissociation between vocal perception and production abilities. Finally, we introduce the Linked Dual Representation (LDR) model, a new approach which can account for the broad patterns in prior findings, including trends in the data which might seem to be countervailing. We discuss how this model interacts with higher-order cognition and examine its predictions about several aspects of vocal perception and production.
Collapse
Affiliation(s)
- Sean Hutchins
- Rotman Research Institute at Baycrest Hospital Toronto, ON, Canada
| | | |
Collapse
|
9
|
Bruneau N, Roux S, Cléry H, Rogier O, Bidet-Caulet A, Barthélémy C. Early neurophysiological correlates of vocal versus non-vocal sound processing in adults. Brain Res 2013; 1528:20-7. [DOI: 10.1016/j.brainres.2013.06.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2012] [Revised: 06/05/2013] [Accepted: 06/07/2013] [Indexed: 10/26/2022]
|
10
|
Hierarchical Neural Encoding of Temporal Regularity in the Human Auditory Cortex. Brain Topogr 2013; 28:459-70. [DOI: 10.1007/s10548-013-0300-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2012] [Accepted: 06/11/2013] [Indexed: 10/26/2022]
|
11
|
Abstract
Across species, there is considerable evidence of preferential processing for biologically significant signals such as conspecific vocalizations and the calls of individual conspecifics. Surprisingly, music cognition in human listeners is typically studied with stimuli that are relatively low in biological significance, such as instrumental sounds. The present study explored the possibility that melodies might be remembered better when presented vocally rather than instrumentally. Adults listened to unfamiliar folk melodies, with some presented in familiar timbres (voice and piano) and others in less familiar timbres (banjo and marimba). They were subsequently tested on recognition of previously heard melodies intermixed with novel melodies. Melodies presented vocally were remembered better than those presented instrumentally even though they were liked less. Factors underlying the advantage for vocal melodies remain to be determined. In line with its biological significance, vocal music may evoke increased vigilance or arousal, which in turn may result in greater depth of processing and enhanced memory for musical details.
Collapse
|
12
|
Capilla A, Belin P, Gross J. The early spatio-temporal correlates and task independence of cerebral voice processing studied with MEG. Cereb Cortex 2012; 23:1388-95. [PMID: 22610392 DOI: 10.1093/cercor/bhs119] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Functional magnetic resonance imaging studies have repeatedly provided evidence for temporal voice areas (TVAs) with particular sensitivity to human voices along bilateral mid/anterior superior temporal sulci and superior temporal gyri (STS/STG). In contrast, electrophysiological studies of the spatio-temporal correlates of cerebral voice processing have yielded contradictory results, finding the earliest correlates either at ∼300-400 ms, or earlier at ∼200 ms ("fronto-temporal positivity to voice", FTPV). These contradictory results are likely the consequence of different stimulus sets and attentional demands. Here, we recorded magnetoencephalography activity while participants listened to diverse types of vocal and non-vocal sounds and performed different tasks varying in attentional demands. Our results confirm the existence of an early voice-preferential magnetic response (FTPVm, the magnetic counterpart of the FTPV) peaking at about 220 ms and distinguishing between vocal and non-vocal sounds as early as 150 ms after stimulus onset. The sources underlying the FTPVm were localized along bilateral mid-STS/STG, largely overlapping with the TVAs. The FTPVm was consistently observed across different stimulus subcategories, including speech and non-speech vocal sounds, and across different tasks. These results demonstrate the early, largely automatic recruitment of focal, voice-selective cerebral mechanisms with a time-course comparable to that of face processing.
Collapse
Affiliation(s)
- Almudena Capilla
- Department of Biological and Health Psychology, Autonoma University of Madrid, Madrid, Spain
| | | | | |
Collapse
|
13
|
Discriminating Male and Female Voices: Differentiating Pitch and Gender. Brain Topogr 2011; 25:194-204. [DOI: 10.1007/s10548-011-0207-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Accepted: 10/24/2011] [Indexed: 11/26/2022]
|
14
|
Abstract
The ability to discriminate conspecific vocalizations is observed across species and early during development. However, its neurophysiologic mechanism remains controversial, particularly regarding whether it involves specialized processes with dedicated neural machinery. We identified spatiotemporal brain mechanisms for conspecific vocalization discrimination in humans by applying electrical neuroimaging analyses to auditory evoked potentials (AEPs) in response to acoustically and psychophysically controlled nonverbal human and animal vocalizations as well as sounds of man-made objects. AEP strength modulations in the absence of topographic modulations are suggestive of statistically indistinguishable brain networks. First, responses were significantly stronger, but topographically indistinguishable to human versus animal vocalizations starting at 169-219 ms after stimulus onset and within regions of the right superior temporal sulcus and superior temporal gyrus. This effect correlated with another AEP strength modulation occurring at 291-357 ms that was localized within the left inferior prefrontal and precentral gyri. Temporally segregated and spatially distributed stages of vocalization discrimination are thus functionally coupled and demonstrate how conventional views of functional specialization must incorporate network dynamics. Second, vocalization discrimination is not subject to facilitated processing in time, but instead lags more general categorization by approximately 100 ms, indicative of hierarchical processing during object discrimination. Third, although differences between human and animal vocalizations persisted when analyses were performed at a single-object level or extended to include additional (man-made) sound categories, at no latency were responses to human vocalizations stronger than those to all other categories. Vocalization discrimination transpires at times synchronous with that of face discrimination but is not functionally specialized.
Collapse
|
15
|
Spierer L, De Lucia M, Bernasconi F, Grivel J, Bourquin NMP, Clarke S, Murray MM. Learning-induced plasticity in human audition: objects, time, and space. Hear Res 2010; 271:88-102. [PMID: 20430070 DOI: 10.1016/j.heares.2010.03.086] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 02/16/2010] [Accepted: 03/03/2010] [Indexed: 10/19/2022]
Abstract
The human auditory system is comprised of specialized but interacting anatomic and functional pathways encoding object, spatial, and temporal information. We review how learning-induced plasticity manifests along these pathways and to what extent there are common mechanisms subserving such plasticity. A first series of experiments establishes a temporal hierarchy along which sounds of objects are discriminated along basic to fine-grained categorical boundaries and learned representations. A widespread network of temporal and (pre)frontal brain regions contributes to object discrimination via recursive processing. Learning-induced plasticity typically manifested as repetition suppression within a common set of brain regions. A second series considered how the temporal sequence of sound sources is represented. We show that lateralized responsiveness during the initial encoding phase of pairs of auditory spatial stimuli is critical for their accurate ordered perception. Finally, we consider how spatial representations are formed and modified through training-induced learning. A population-based model of spatial processing is supported wherein temporal and parietal structures interact in the encoding of relative and absolute spatial information over the initial ~300 ms post-stimulus onset. Collectively, these data provide insights into the functional organization of human audition and open directions for new developments in targeted diagnostic and neurorehabilitation strategies.
Collapse
Affiliation(s)
- Lucas Spierer
- Neuropsychology and Neurorehabilitation Service, Department of Clinical Neuroscience, Vaudois University Hospital Center and University of Lausanne, Switzerland
| | | | | | | | | | | | | |
Collapse
|
16
|
Murray MM, Spierer L. Auditory spatio-temporal brain dynamics and their consequences for multisensory interactions in humans. Hear Res 2009; 258:121-33. [DOI: 10.1016/j.heares.2009.04.022] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Revised: 04/28/2009] [Accepted: 04/28/2009] [Indexed: 11/24/2022]
|
17
|
Rogier O, Roux S, Belin P, Bonnet-Brilhault F, Bruneau N. An electrophysiological correlate of voice processing in 4- to 5-year-old children. Int J Psychophysiol 2009; 75:44-7. [PMID: 19896509 DOI: 10.1016/j.ijpsycho.2009.10.013] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2009] [Revised: 10/29/2009] [Accepted: 10/30/2009] [Indexed: 11/16/2022]
Abstract
Cortical auditory evoked potentials were studied in responses to voice and environmental sounds in 4- to 5-year-old children. A specific response to voice was dissociated from the response to environmental sounds. It appeared as a positive deflection recorded at right fronto-temporal sites and beginning within 60ms of stimulus onset. We termed this response Fronto-Temporal Positivity to Voice (FTPV).
Collapse
Affiliation(s)
- Ophelie Rogier
- UMR INSERM U930, CNRS ERL 3106, Université François-Rabelais de Tours, CHRU de Tours, France.
| | | | | | | | | |
Collapse
|
18
|
Charest I, Pernet CR, Rousselet GA, Quiñones I, Latinus M, Fillion-Bilodeau S, Chartrand JP, Belin P. Electrophysiological evidence for an early processing of human voices. BMC Neurosci 2009; 10:127. [PMID: 19843323 PMCID: PMC2770575 DOI: 10.1186/1471-2202-10-127] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2009] [Accepted: 10/20/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Previous electrophysiological studies have identified a "voice specific response" (VSR) peaking around 320 ms after stimulus onset, a latency markedly longer than the 70 ms needed to discriminate living from non-living sound sources and the 150 ms to 200 ms needed for the processing of voice paralinguistic qualities. In the present study, we investigated whether an early electrophysiological difference between voice and non-voice stimuli could be observed. RESULTS ERPs were recorded from 32 healthy volunteers who listened to 200 ms long stimuli from three sound categories - voices, bird songs and environmental sounds - whilst performing a pure-tone detection task. ERP analyses revealed voice/non-voice amplitude differences emerging as early as 164 ms post stimulus onset and peaking around 200 ms on fronto-temporal (positivity) and occipital (negativity) electrodes. CONCLUSION Our electrophysiological results suggest a rapid brain discrimination of sounds of voice, termed the "fronto-temporal positivity to voices" (FTPV), at latencies comparable to the well-known face-preferential N170.
Collapse
Affiliation(s)
- Ian Charest
- Centre for Cognitive NeuroImaging (CCNi) & Department of Psychology, University of Glasgow, Glasgow, UK.
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Brancucci A, Lucci G, Mazzatenta A, Tommasi L. Asymmetries of the human social brain in the visual, auditory and chemical modalities. Philos Trans R Soc Lond B Biol Sci 2009; 364:895-914. [PMID: 19064350 PMCID: PMC2666086 DOI: 10.1098/rstb.2008.0279] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Structural and functional asymmetries are present in many regions of the human brain responsible for motor control, sensory and cognitive functions and communication. Here, we focus on hemispheric asymmetries underlying the domain of social perception, broadly conceived as the analysis of information about other individuals based on acoustic, visual and chemical signals. By means of these cues the brain establishes the border between 'self' and 'other', and interprets the surrounding social world in terms of the physical and behavioural characteristics of conspecifics essential for impression formation and for creating bonds and relationships. We show that, considered from the standpoint of single- and multi-modal sensory analysis, the neural substrates of the perception of voices, faces, gestures, smells and pheromones, as evidenced by modern neuroimaging techniques, are characterized by a general pattern of right-hemispheric functional asymmetry that might benefit from other aspects of hemispheric lateralization rather than constituting a true specialization for social information.
Collapse
Affiliation(s)
| | | | | | - Luca Tommasi
- Department of Biomedical Sciences, Institute for Advanced Biomedical Technologies, University of ChietiBlocco A, Via dei Vestini 29, 66013 Chieti, Italy
| |
Collapse
|
20
|
Yrttiaho S, Tiitinen H, May PJC, Leino S, Alku P. Cortical sensitivity to periodicity of speech sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:2191-2199. [PMID: 18397025 DOI: 10.1121/1.2888489] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Previous non-invasive brain research has reported auditory cortical sensitivity to periodicity as reflected by larger and more anterior responses to periodic than to aperiodic vowels. The current study investigated whether there is a lower fundamental frequency (F0) limit for this effect. Auditory evoked fields (AEFs) elicited by natural-sounding 400 ms periodic and aperiodic vowel stimuli were measured with magnetoencephalography. Vowel F0 ranged from normal male speech (113 Hz) to exceptionally low values (9 Hz). Both the auditory N1m and sustained fields were larger in amplitude for periodic than for aperiodic vowels. The AEF sources for periodic vowels were also anterior to those for the aperiodic vowels. Importantly, the AEF amplitudes and locations were unaffected by the F0 decrement of the periodic vowels. However, the N1m latency increased monotonically as F0 was decreased down to 19 Hz, below which this trend broke down. Also, a cascade of transient N1m-like responses was observed in the lowest F0 condition. Thus, the auditory system seems capable of extracting the periodicity even from very low F0 vowels. The behavior of the N1m latency and the emergence of a response cascade at very low F0 values may reflect the lower limit of pitch perception.
Collapse
Affiliation(s)
- Santeri Yrttiaho
- Department of Signal Processing and Acoustics, Helsinki University of Technology, PO Box 3300, FI-02015 TKK, Finland.
| | | | | | | | | |
Collapse
|
21
|
Abstract
To investigate the temporal aspect of timbre processing, we recorded auditory-evoked neuromagnetic responses to periodic complex sounds, which were matched in all acoustic parameters except for two fundamental frequencies (F0s) and 12 spectral envelopes of vocal and nonvocal categories. Only for nonvocal sounds, a significant difference in N1m latency for F0 was detected in both hemispheres. A significant difference among stimuli was detected in both hemispheres for vocal and linear sounds, whereas only in the right hemisphere for instrumental sounds. Moreover, the results of paired comparison among F0s revealed that not only the vocal sounds but also some of the nonvocal sounds were F0-independent. This latency independence may be attributed to the relatively high power in the higher frequency spectrum.
Collapse
|
22
|
Mizuochi T, Yumoto M, Karino S, Itoh K, Yamakawa K, Kaga K. Perceptual categorization of sound spectral envelopes reflected in auditory-evoked N1m. Neuroreport 2005; 16:555-8. [PMID: 15812306 DOI: 10.1097/00001756-200504250-00007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Magnetic responses to periodic complex sounds with equivalent acoustic parameters except for two different fundamental frequencies (F0) and 12 different spectral envelopes of vocal, instrumental, and linear shapes were recorded to determine the cortical representation of timbre categorization in humans. Responses at approximately 100 ms (N1m) to vocal and instrumental (nonlinear) sounds were localized significantly anterior to linear sound responses. N1m source strength for nonlinear sounds was significantly larger than that for linear sounds, and this difference was more marked in the left hemisphere than in the right. N1m peak latency only for vocal sounds was not affected by F0. Perceptual categorization was reflected in N1m source strength and location (linear or nonlinear), and in N1m latency (vocal or nonvocal).
Collapse
Affiliation(s)
- Tomomi Mizuochi
- Departments of Sensory and Motor Neuroscience, Graduate School of Medicine, University of Tokyo, Bunkyo-ku, Tokyo 113-8655, Japan.
| | | | | | | | | | | |
Collapse
|
23
|
N'Diaye K, Ragot R, Garnero L, Pouthas V. What is common to brain activity evoked by the perception of visual and auditory filled durations? A study with MEG and EEG co-recordings. ACTA ACUST UNITED AC 2004; 21:250-68. [PMID: 15464356 DOI: 10.1016/j.cogbrainres.2004.04.006] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/15/2004] [Indexed: 11/23/2022]
Abstract
EEG and MEG scalp data were simultaneously recorded while human participants were performing a duration discrimination task in visual and auditory modality, separately. Short durations were used ranging from 500 to 900 ms, among which participants had to discriminate a previously memorized 700-ms "standard" duration. Behavioral results show accurate but variable performance within and between participants with expected modality effects: the percentage of responses was greater and the mean response time was shorter for auditory than for visual signals. Sustained electric and magnetic activities were obtained correlatively to duration estimation, but with distinct spatiotemporal properties. Electric CNV-like potentials showed fronto-central negativity in both modalities, whereas magnetic sustained fields were distributed with respect to the modality of the interval to be timed. Time courses of these slow brain activities were found to be dependent on stimulus duration but not on its modality nor on the recording signal (EEG or MEG). Source reconstruction demonstrated that these sustained potentials/fields were generated by superimposed contributions from visual and auditory cortices (sustained sensory responses, SSR) and from prefrontal and parietal regions. By using these two complementary techniques, we thus demonstrated the involvement of frontal and parietal cerebral cortex in human timing.
Collapse
Affiliation(s)
- Karim N'Diaye
- Laboratoire de Neurosciences Cognitives et Imagerie Cérébrale, CNRS UPR640-LENA, Hôpital Salpêtrière, 47 Boulevard de l'Hôpital, 75651 Paris Cedex 13, France.
| | | | | | | |
Collapse
|