51
|
Functional correlates of the speech-in-noise perception impairment in dyslexia: An MRI study. Neuropsychologia 2014; 60:103-14. [DOI: 10.1016/j.neuropsychologia.2014.05.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2013] [Revised: 05/23/2014] [Accepted: 05/24/2014] [Indexed: 10/25/2022]
|
52
|
Interaction between auditory and motor systems in speech perception. Neurosci Bull 2014; 30:490-6. [PMID: 24604634 DOI: 10.1007/s12264-013-1428-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Accepted: 08/20/2013] [Indexed: 10/25/2022] Open
Abstract
Based on the Motor Theory of speech perception, the interaction between the auditory and motor systems plays an essential role in speech perception. Since the Motor Theory was proposed, it has received remarkable attention in the field. However, each of the three hypotheses of the theory still needs further verification. In this review, we focus on how the auditory-motor anatomical and functional associations play a role in speech perception and discuss why previous studies could not reach an agreement and particularly whether the motor system involvement in speech perception is task-load dependent. Finally, we suggest that the function of the auditory-motor link is particularly useful for speech perception under adverse listening conditions and the further revised Motor Theory is a potential solution to the "cocktail-party" problem.
Collapse
|
53
|
Schall S, von Kriegstein K. Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception. PLoS One 2014; 9:e86325. [PMID: 24466026 PMCID: PMC3900530 DOI: 10.1371/journal.pone.0086325] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 12/06/2013] [Indexed: 11/29/2022] Open
Abstract
It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Collapse
Affiliation(s)
- Sonja Schall
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- * E-mail:
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- Humboldt University of Berlin, Berlin, Germany
| |
Collapse
|
54
|
Maidment DW, Macken B, Jones DM. Modalities of memory: Is reading lips like hearing voices? Cognition 2013; 129:471-93. [DOI: 10.1016/j.cognition.2013.08.017] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Revised: 07/30/2013] [Accepted: 08/14/2013] [Indexed: 11/26/2022]
|
55
|
Age-associated reduction of asymmetry in human central auditory function: a 1H-magnetic resonance spectroscopy study. Neural Plast 2013; 2013:735290. [PMID: 24222864 PMCID: PMC3809597 DOI: 10.1155/2013/735290] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Revised: 08/30/2013] [Accepted: 09/03/2013] [Indexed: 11/24/2022] Open
Abstract
The aim of this study was to investigate the effects of age on hemispheric asymmetry in the auditory cortex after pure tone stimulation. Ten young and 8 older healthy volunteers took part in this study. Two-dimensional multivoxel 1H-magnetic resonance spectroscopy scans were performed before and after stimulation. The ratios of N-acetylaspartate (NAA), glutamate/glutamine (Glx), and γ-amino butyric acid (GABA) to creatine (Cr) were determined and compared between the two groups. The distribution of metabolites between the left and right auditory cortex was also determined. Before stimulation, left and right side NAA/Cr and right side GABA/Cr were significantly lower, whereas right side Glx/Cr was significantly higher in the older group compared with the young group. After stimulation, left and right side NAA/Cr and GABA/Cr were significantly lower, whereas left side Glx/Cr was significantly higher in the older group compared with the young group. There was obvious asymmetry in right side Glx/Cr and left side GABA/Cr after stimulation in young group, but not in older group. In summary, there is marked hemispheric asymmetry in auditory cortical metabolites following pure tone stimulation in young, but not older adults. This reduced asymmetry in older adults may at least in part underlie the speech perception difficulties/presbycusis experienced by aging adults.
Collapse
|
56
|
Schneider DM, Woolley SMN. Sparse and background-invariant coding of vocalizations in auditory scenes. Neuron 2013; 79:141-52. [PMID: 23849201 DOI: 10.1016/j.neuron.2013.04.038] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/26/2013] [Indexed: 11/26/2022]
Abstract
Vocal communicators such as humans and songbirds readily recognize individual vocalizations, even in distracting auditory environments. This perceptual ability is likely subserved by auditory neurons whose spiking responses to individual vocalizations are minimally affected by background sounds. However, auditory neurons that produce background-invariant responses to vocalizations in auditory scenes have not been found. Here, we describe a population of neurons in the zebra finch auditory cortex that represent vocalizations with a sparse code and that maintain their vocalization-like firing patterns in levels of background sound that permit behavioral recognition. These same neurons decrease or stop spiking in levels of background sound that preclude behavioral recognition. In contrast, upstream neurons represent vocalizations with dense and background-corrupted responses. We provide experimental evidence suggesting that sparse coding is mediated by feedforward suppression. Finally, we show through simulations that feedforward inhibition can transform a dense representation of vocalizations into a sparse and background-invariant representation.
Collapse
Affiliation(s)
- David M Schneider
- Program in Neurobiology and Behavior, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
57
|
Du Y, He Y, Arnott SR, Ross B, Wu X, Li L, Alain C. Rapid tuning of auditory "what" and "where" pathways by training. ACTA ACUST UNITED AC 2013; 25:496-506. [PMID: 24042339 DOI: 10.1093/cercor/bht251] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Behavioral improvement within the first hour of training is commonly explained as procedural learning (i.e., strategy changes resulting from task familiarization). However, it may additionally reflect a rapid adjustment of the perceptual and/or attentional system in a goal-directed task. In support of this latter hypothesis, we show feature-specific gains in performance for groups of participants briefly trained to use either a spectral or spatial difference between 2 vowels presented simultaneously during a vowel identification task. In both groups, the neuromagnetic activity measured during the vowel identification task following training revealed source activity in auditory cortices, prefrontal, inferior parietal, and motor areas. More importantly, the contrast between the 2 groups revealed a striking double dissociation in which listeners trained on spectral or spatial cues showed higher source activity in ventral ("what") and dorsal ("where") brain areas, respectively. These feature-specific effects indicate that brief training can implicitly bias top-down processing to a trained acoustic cue and induce a rapid recalibration of the ventral and dorsal auditory streams during speech segregation and identification.
Collapse
Affiliation(s)
- Yi Du
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada M6A 2E1 Department of Psychology, Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China and
| | - Yu He
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada M6A 2E1
| | - Stephen R Arnott
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada M6A 2E1
| | - Bernhard Ross
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada M6A 2E1
| | - Xihong Wu
- Department of Psychology, Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China and
| | - Liang Li
- Department of Psychology, Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China and
| | - Claude Alain
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada M6A 2E1 Department of Psychology, University of Toronto, Ontario, Canada M8V 2S4
| |
Collapse
|
58
|
Abstract
Can learning capacity of the human brain be predicted from initial spontaneous functional connectivity (FC) between brain areas involved in a task? We combined task-related functional magnetic resonance imaging (fMRI) and resting-state fMRI (rs-fMRI) before and after training with a Hindi dental-retroflex nonnative contrast. Previous fMRI results were replicated, demonstrating that this learning recruited the left insula/frontal operculum and the left superior parietal lobe, among other areas of the brain. Crucially, resting-state FC (rs-FC) between these two areas at pretraining predicted individual differences in learning outcomes after distributed (Experiment 1) and intensive training (Experiment 2). Furthermore, this rs-FC was reduced at posttraining, a change that may also account for learning. Finally, resting-state network analyses showed that the mechanism underlying this reduction of rs-FC was mainly a transfer in intrinsic activity of the left frontal operculum/anterior insula from the left frontoparietal network to the salience network. Thus, rs-FC may contribute to predict learning ability and to understand how learning modifies the functioning of the brain. The discovery of this correspondence between initial spontaneous brain activity in task-related areas and posttraining performance opens new avenues to find predictors of learning capacities in the brain using task-related fMRI and rs-fMRI combined.
Collapse
|
59
|
Liu B, Lin Y, Gao X, Dang J. Correlation between audio-visual enhancement of speech in different noise environments and SNR: a combined behavioral and electrophysiological study. Neuroscience 2013; 247:145-51. [PMID: 23673276 DOI: 10.1016/j.neuroscience.2013.05.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Revised: 05/02/2013] [Accepted: 05/03/2013] [Indexed: 11/30/2022]
Abstract
In the present study, we investigated the multisensory gain as the difference of speech recognition accuracies between the audio-visual (AV) and auditory-only (A) conditions, and the multisensory gain as the difference between the event-related potentials (ERPs) evoked under the AV condition and the sum of the ERPs evoked under the A and visual-only (V) conditions in different noise environments. Videos of a female speaker articulating the Chinese monosyllable words accompanied with different levels of pink noise were used as the stimulus materials. The selected signal-to-noise ratios (SNRs) were -16, -12, -8, -4 and 0 dB. Under the A, V and AV conditions the accuracy of the speech recognition was measured and the ERPs evoked under different conditions were analyzed, respectively. The behavioral results showed that the maximum gain as the difference of speech recognition accuracies between the AV and A conditions was at the -12 dB SNR. The ERP results showed that the multisensory gain as the difference between the ERPs evoked under the AV condition and the sum of ERPs evoked under the A and V conditions at the -12 dB SNR was significantly higher than those at the other SNRs in the time window of 130-200 ms in the area from frontal to central region. The multisensory gains in audio-visual speech recognition at different SNRs were not completely accordant with the principle of inverse effectiveness, but confirmed to cross-modal stochastic resonance.
Collapse
Affiliation(s)
- B Liu
- School of Computer Science and Technology, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin 300072, PR China.
| | | | | | | |
Collapse
|
60
|
Zion Golumbic EM, Ding N, Bickel S, Lakatos P, Schevon CA, McKhann GM, Goodman RR, Emerson R, Mehta AD, Simon JZ, Poeppel D, Schroeder CE. Mechanisms underlying selective neuronal tracking of attended speech at a "cocktail party". Neuron 2013; 77:980-91. [PMID: 23473326 DOI: 10.1016/j.neuron.2012.12.037] [Citation(s) in RCA: 516] [Impact Index Per Article: 46.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/21/2012] [Indexed: 11/26/2022]
Abstract
The ability to focus on and understand one talker in a noisy social environment is a critical social-cognitive capacity, whose underlying neuronal mechanisms are unclear. We investigated the manner in which speech streams are represented in brain activity and the way that selective attention governs the brain's representation of speech using a "Cocktail Party" paradigm, coupled with direct recordings from the cortical surface in surgical epilepsy patients. We find that brain activity dynamically tracks speech streams using both low-frequency phase and high-frequency amplitude fluctuations and that optimal encoding likely combines the two. In and near low-level auditory cortices, attention "modulates" the representation by enhancing cortical tracking of attended speech streams, but ignored speech remains represented. In higher-order regions, the representation appears to become more "selective," in that there is no detectable tracking of ignored speech. This selectivity itself seems to sharpen as a sentence unfolds.
Collapse
Affiliation(s)
- Elana M Zion Golumbic
- Department of Psychiatry, Columbia University College of Physicians and Surgeons, New York, NY, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
61
|
Visual input enhances selective speech envelope tracking in auditory cortex at a "cocktail party". J Neurosci 2013; 33:1417-26. [PMID: 23345218 DOI: 10.1523/jneurosci.3675-12.2013] [Citation(s) in RCA: 139] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Our ability to selectively attend to one auditory signal amid competing input streams, epitomized by the "Cocktail Party" problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared with responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker's face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a Cocktail Party setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive.
Collapse
|
62
|
Diaconescu AO, Hasher L, McIntosh AR. Visual dominance and multisensory integration changes with age. Neuroimage 2013; 65:152-66. [PMID: 23036447 DOI: 10.1016/j.neuroimage.2012.09.057] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2012] [Revised: 09/23/2012] [Accepted: 09/24/2012] [Indexed: 10/27/2022] Open
|
63
|
Asai T, Kanayama N. "Cutaneous rabbit" hops toward a light: unimodal and cross-modal causality on the skin. Front Psychol 2012; 3:427. [PMID: 23133432 PMCID: PMC3490328 DOI: 10.3389/fpsyg.2012.00427] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2012] [Accepted: 10/01/2012] [Indexed: 11/23/2022] Open
Abstract
Our somatosensory system deals with not only spatial but also temporal imprecision, resulting in characteristic spatiotemporal illusions. Repeated rapid stimulation at the wrist, then near the elbow, can create the illusion of touch at intervening locations along the arm (as if a rabbit is hopping along the arm). This is known as the “cutaneous rabbit effect” (CRE). Previous studies have suggested that the CRE involves not only an intrinsic somatotopic representation but also the representation of an extended body schema that includes causality or animacy perception upon the skin. On the other hand, unlike other multi-modal causality couplings, it is possible that the CRE is not affected by concurrent auditory temporal information. The present study examined the effect of a simple visual flash on the CRE, which has both temporal and spatial information. Here, stronger cross-modal causality or correspondence could be provided. We presented three successive tactile stimuli on the inside of a participant’s left arm. Stimuli were presented on the wrist, elbow, and midway between the two. Results from our five experimental manipulations suggest that a one-shot flash enhances or attenuates the CRE depending on its congruency with cutaneous rabbit saltation. Our results reflect that (1) our brain interprets successive stimuli on the skin as motion in terms of time and space (unimodal causality) and that (2) the concurrent signals from other modalities provide clues for creating unified representations of this external motion (multi-modal causality) as to the extent that “spatiotemporal” synchronicity among modalities is provided.
Collapse
Affiliation(s)
- Tomohisa Asai
- Department of Psychology, Chiba University Chiba, Japan
| | | |
Collapse
|
64
|
Bishop CW, London S, Miller LM. Neural time course of visually enhanced echo suppression. J Neurophysiol 2012; 108:1869-83. [PMID: 22786953 PMCID: PMC3545000 DOI: 10.1152/jn.00175.2012] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2012] [Accepted: 07/08/2012] [Indexed: 11/22/2022] Open
Abstract
Auditory spatial perception plays a critical role in day-to-day communication. For instance, listeners utilize acoustic spatial information to segregate individual talkers into distinct auditory "streams" to improve speech intelligibility. However, spatial localization is an exceedingly difficult task in everyday listening environments with numerous distracting echoes from nearby surfaces, such as walls. Listeners' brains overcome this unique challenge by relying on acoustic timing and, quite surprisingly, visual spatial information to suppress short-latency (1-10 ms) echoes through a process known as "the precedence effect" or "echo suppression." In the present study, we employed electroencephalography (EEG) to investigate the neural time course of echo suppression both with and without the aid of coincident visual stimulation in human listeners. We find that echo suppression is a multistage process initialized during the auditory N1 (70-100 ms) and followed by space-specific suppression mechanisms from 150 to 250 ms. Additionally, we find a robust correlate of listeners' spatial perception (i.e., suppressing or not suppressing the echo) over central electrode sites from 300 to 500 ms. Contrary to our hypothesis, vision's powerful contribution to echo suppression occurs late in processing (250-400 ms), suggesting that vision contributes primarily during late sensory or decision making processes. Together, our findings support growing evidence that echo suppression is a slow, progressive mechanism modifiable by visual influences during late sensory and decision making stages. Furthermore, our findings suggest that audiovisual interactions are not limited to early, sensory-level modulations but extend well into late stages of cortical processing.
Collapse
Affiliation(s)
- Christopher W Bishop
- Center for Mind and Brain, University of California, Davis, California 95618, USA.
| | | | | |
Collapse
|
65
|
Hailstone JC, Ridgway GR, Bartlett JW, Goll JC, Crutch SJ, Warren JD. Accent processing in dementia. Neuropsychologia 2012; 50:2233-44. [PMID: 22664324 PMCID: PMC3484399 DOI: 10.1016/j.neuropsychologia.2012.05.027] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Revised: 05/10/2012] [Accepted: 05/24/2012] [Indexed: 11/27/2022]
Abstract
Accented speech conveys important nonverbal information about the speaker as well as presenting the brain with the problem of decoding a non-canonical auditory signal. The processing of non-native accents has seldom been studied in neurodegenerative disease and its brain basis remains poorly understood. Here we investigated the processing of non-native international and regional accents of English in cohorts of patients with Alzheimer's disease (AD; n=20) and progressive nonfluent aphasia (PNFA; n=6) in relation to healthy older control subjects (n=35). A novel battery was designed to assess accent comprehension and recognition and all subjects had a general neuropsychological assessment. Neuroanatomical associations of accent processing performance were assessed using voxel-based morphometry on MR brain images within the larger AD group. Compared with healthy controls, both the AD and PNFA groups showed deficits of non-native accent recognition and the PNFA group showed reduced comprehension of words spoken in international accents compared with a Southern English accent. At individual subject level deficits were observed more consistently in the PNFA group, and the disease groups showed different patterns of accent comprehension impairment (generally more marked for sentences in AD and for single words in PNFA). Within the AD group, grey matter associations of accent comprehension and recognition were identified in the anterior superior temporal lobe. The findings suggest that accent processing deficits may constitute signatures of neurodegenerative disease with potentially broader implications for understanding how these diseases affect vocal communication under challenging listening conditions.
Collapse
Affiliation(s)
- Julia C. Hailstone
- Dementia Research Centre, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK
| | - Gerard R. Ridgway
- Dementia Research Centre, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK
- Wellcome Trust Centre for Neuroimaging, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK
| | - Jonathan W. Bartlett
- Dementia Research Centre, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK
- Department of Medical Statistics, London School of Hygiene & Tropical Medicine, London, UK
| | - Johanna C. Goll
- Dementia Research Centre, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK
| | - Sebastian J. Crutch
- Dementia Research Centre, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK
| | - Jason D. Warren
- Dementia Research Centre, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK
| |
Collapse
|
66
|
Speech comprehension aided by multiple modalities: behavioural and neural interactions. Neuropsychologia 2012; 50:762-76. [PMID: 22266262 DOI: 10.1016/j.neuropsychologia.2012.01.010] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2011] [Revised: 12/30/2011] [Accepted: 01/08/2012] [Indexed: 11/24/2022]
Abstract
Speech comprehension is a complex human skill, the performance of which requires the perceiver to combine information from several sources - e.g. voice, face, gesture, linguistic context - to achieve an intelligible and interpretable percept. We describe a functional imaging investigation of how auditory, visual and linguistic information interact to facilitate comprehension. Our specific aims were to investigate the neural responses to these different information sources, alone and in interaction, and further to use behavioural speech comprehension scores to address sites of intelligibility-related activation in multifactorial speech comprehension. In fMRI, participants passively watched videos of spoken sentences, in which we varied Auditory Clarity (with noise-vocoding), Visual Clarity (with Gaussian blurring) and Linguistic Predictability. Main effects of enhanced signal with increased auditory and visual clarity were observed in overlapping regions of posterior STS. Two-way interactions of the factors (auditory × visual, auditory × predictability) in the neural data were observed outside temporal cortex, where positive signal change in response to clearer facial information and greater semantic predictability was greatest at intermediate levels of auditory clarity. Overall changes in stimulus intelligibility by condition (as determined using an independent behavioural experiment) were reflected in the neural data by increased activation predominantly in bilateral dorsolateral temporal cortex, as well as inferior frontal cortex and left fusiform gyrus. Specific investigation of intelligibility changes at intermediate auditory clarity revealed a set of regions, including posterior STS and fusiform gyrus, showing enhanced responses to both visual and linguistic information. Finally, an individual differences analysis showed that greater comprehension performance in the scanning participants (measured in a post-scan behavioural test) were associated with increased activation in left inferior frontal gyrus and left posterior STS. The current multimodal speech comprehension paradigm demonstrates recruitment of a wide comprehension network in the brain, in which posterior STS and fusiform gyrus form sites for convergence of auditory, visual and linguistic information, while left-dominant sites in temporal and frontal cortex support successful comprehension.
Collapse
|
67
|
Shetake JA, Wolf JT, Cheung RJ, Engineer CT, Ram SK, Kilgard MP. Cortical activity patterns predict robust speech discrimination ability in noise. Eur J Neurosci 2011; 34:1823-38. [PMID: 22098331 DOI: 10.1111/j.1460-9568.2011.07887.x] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem.
Collapse
Affiliation(s)
- Jai A Shetake
- The University of Texas at Dallas, School of Behavioral Brain Sciences, 800 West Campbell Road, GR41 Richardson, TX 75080-3021, USA
| | | | | | | | | | | |
Collapse
|
68
|
Hailstone JC, Ridgway GR, Bartlett JW, Goll JC, Buckley AH, Crutch SJ, Warren JD. Voice processing in dementia: a neuropsychological and neuroanatomical analysis. ACTA ACUST UNITED AC 2011; 134:2535-47. [PMID: 21908871 PMCID: PMC3170540 DOI: 10.1093/brain/awr205] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Voice processing in neurodegenerative disease is poorly understood. Here we undertook a systematic investigation of voice processing in a cohort of patients with clinical diagnoses representing two canonical dementia syndromes: temporal variant frontotemporal lobar degeneration (n = 14) and Alzheimer’s disease (n = 22). Patient performance was compared with a healthy matched control group (n = 35). All subjects had a comprehensive neuropsychological assessment including measures of voice perception (vocal size, gender, speaker discrimination) and voice recognition (familiarity, identification, naming and cross-modal matching) and equivalent measures of face and name processing. Neuroanatomical associations of voice processing performance were assessed using voxel-based morphometry. Both disease groups showed deficits on all aspects of voice recognition and impairment was more severe in the temporal variant frontotemporal lobar degeneration group than the Alzheimer’s disease group. Face and name recognition were also impaired in both disease groups and name recognition was significantly more impaired than other modalities in the temporal variant frontotemporal lobar degeneration group. The Alzheimer’s disease group showed additional deficits of vocal gender perception and voice discrimination. The neuroanatomical analysis across both disease groups revealed common grey matter associations of familiarity, identification and cross-modal recognition in all modalities in the right temporal pole and anterior fusiform gyrus; while in the Alzheimer’s disease group, voice discrimination was associated with grey matter in the right inferior parietal lobe. The findings suggest that impairments of voice recognition are significant in both these canonical dementia syndromes but particularly severe in temporal variant frontotemporal lobar degeneration, whereas impairments of voice perception may show relative specificity for Alzheimer’s disease. The right anterior temporal lobe is likely to have a critical role in the recognition of voices and other modalities of person knowledge.
Collapse
Affiliation(s)
- Julia C Hailstone
- Dementia Research Centre, Institute of Neurology, University College London, Queen Square, London WC1N 3BG, UK
| | | | | | | | | | | | | |
Collapse
|
69
|
Audiovisual Asynchrony Detection and Speech Intelligibility in Noise With Moderate to Severe Sensorineural Hearing Impairment. Ear Hear 2011; 32:582-92. [DOI: 10.1097/aud.0b013e31820fca23] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
70
|
Bishop CW, Miller LM. Speech cues contribute to audiovisual spatial integration. PLoS One 2011; 6:e24016. [PMID: 21909378 PMCID: PMC3166076 DOI: 10.1371/journal.pone.0024016] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2011] [Accepted: 08/02/2011] [Indexed: 11/21/2022] Open
Abstract
Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral ‘what’ and dorsal ‘where’ pathways.
Collapse
Affiliation(s)
- Christopher W Bishop
- Center for Mind and Brain, University of California Davis, Davis, California, United States of America.
| | | |
Collapse
|
71
|
Diaconescu AO, Alain C, McIntosh AR. The co-occurrence of multisensory facilitation and cross-modal conflict in the human brain. J Neurophysiol 2011; 106:2896-909. [PMID: 21880944 DOI: 10.1152/jn.00303.2011] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Perceptual objects often comprise a visual and auditory signature that arrives simultaneously through distinct sensory channels, and cross-modal features are linked by virtue of being attributed to a specific object. Continued exposure to cross-modal events sets up expectations about what a given object most likely "sounds" like, and vice versa, thereby facilitating object detection and recognition. The binding of familiar auditory and visual signatures is referred to as semantic, multisensory integration. Whereas integration of semantically related cross-modal features is behaviorally advantageous, situations of sensory dominance of one modality at the expense of another impair performance. In the present study, magnetoencephalography recordings of semantically related cross-modal and unimodal stimuli captured the spatiotemporal patterns underlying multisensory processing at multiple stages. At early stages, 100 ms after stimulus onset, posterior parietal brain regions responded preferentially to cross-modal stimuli irrespective of task instructions or the degree of semantic relatedness between the auditory and visual components. As participants were required to classify cross-modal stimuli into semantic categories, activity in superior temporal and posterior cingulate cortices increased between 200 and 400 ms. As task instructions changed to incorporate cross-modal conflict, a process whereby auditory and visual components of cross-modal stimuli were compared to estimate their degree of congruence, multisensory processes were captured in parahippocampal, dorsomedial, and orbitofrontal cortices 100 and 400 ms after stimulus onset. Our results suggest that multisensory facilitation is associated with posterior parietal activity as early as 100 ms after stimulus onset. However, as participants are required to evaluate cross-modal stimuli based on their semantic category or their degree of congruence, multisensory processes extend in cingulate, temporal, and prefrontal cortices.
Collapse
|
72
|
Abstract
Auditory signals are decomposed into discrete frequency elements early in the transduction process, yet somehow these signals are recombined into the rich acoustic percepts that we readily identify and are familiar with. The cerebral cortex is necessary for the perception of these signals, and studies from several laboratories over the past decade have made significant advances in our understanding of the neuronal mechanisms underlying auditory perception. This review will concentrate on recent studies in the macaque monkey that indicate that the activity of populations of neurons better accounts for the perceptual abilities compared to the activity of single neurons. The best examples address whether the acoustic space is represented along the "where" pathway in the caudal regions of auditory cortex. Our current understanding of how such population activity could also underlie the perception of the nonspatial features of acoustic stimuli is reviewed, as is how multisensory interactions can influence our auditory perception.
Collapse
Affiliation(s)
- Gregg H Recanzone
- Center for Neuroscience and Department of Neurobiology, Physiology and Behavior, University of California, Davis, California
| |
Collapse
|
73
|
Stevenson RA, VanDerKlok RM, Pisoni DB, James TW. Discrete neural substrates underlie complementary audiovisual speech integration processes. Neuroimage 2010; 55:1339-45. [PMID: 21195198 DOI: 10.1016/j.neuroimage.2010.12.063] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2010] [Revised: 12/14/2010] [Accepted: 12/23/2010] [Indexed: 11/25/2022] Open
Abstract
The ability to combine information from multiple sensory modalities into a single, unified percept is a key element in an organism's ability to interact with the external world. This process of perceptual fusion, the binding of multiple sensory inputs into a perceptual gestalt, is highly dependent on the temporal synchrony of the sensory inputs. Using fMRI, we identified two anatomically distinct brain regions in the superior temporal cortex, one involved with processing temporal-synchrony, and one with processing perceptual fusion of audiovisual speech. This dissociation suggests that the superior temporal cortex should be considered a "neuronal hub" composed of multiple discrete subregions that underlie an array of complementary low- and high-level multisensory integration processes. In this role, abnormalities in the structure and function of superior temporal cortex provide a possible common etiology for temporal-processing and perceptual-fusion deficits seen in a number of clinical populations, including individuals with autism spectrum disorder, dyslexia, and schizophrenia.
Collapse
Affiliation(s)
- Ryan A Stevenson
- Department of Psychological and Brain Sciences, Indiana University, USA.
| | | | | | | |
Collapse
|
74
|
Reproducibility of fMRI activations associated with auditory sentence comprehension. Neuroimage 2010; 54:2138-55. [PMID: 20933093 DOI: 10.1016/j.neuroimage.2010.09.082] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2010] [Revised: 09/04/2010] [Accepted: 09/27/2010] [Indexed: 11/22/2022] Open
Abstract
The reproducibility of three different aspects of fMRI activations-namely binary activation maps, effect size and spatial distribution of local maxima-was evaluated for an auditory sentence comprehension task with high attention demand on a group of 17 subjects that were scanned on five different occasions. While in the scanner subjects were asked to listen to a series of six short everyday sentences from the CUNY sentence test. Comprehension and attention to the stimuli were monitored after each listen condition epoch by having subjects answer a series of multiple-choice questions. Statistical maps of activation for the listen condition were computed at three different levels: overall results for all imaging sessions, group-level/single-session results for each of the five imaging occasions, and single-subject/single-session results computed for each subject and each scanning occasion independently. The experimental task recruited a distributed bilateral network with processing nodes located in the lateral temporal cortex, inferior frontal cortex, medial BA6, medial occipital cortex and subcortical structures such as the putamen and the thalamus. Reproducibility of these activations at the group level was high (83.95% of the imaged volume was consistently classified as active/inactive across all five imaging sessions), indicating that sites of neuronal activity associated with auditory comprehension can reliably be detected with fMRI in healthy subjects, across repeated measures after group averaging. At the single-subject level reproducibility ranged from moderate to high, although no significant differences were found on behavioral measures across subjects or sessions. This result suggests that contextual differences-i.e., those specific to each imaging session, can modulate our ability to detect fMRI activations associated with speech comprehension in individual subjects.
Collapse
|
75
|
Song JH, Skoe E, Banai K, Kraus N. Perception of speech in noise: neural correlates. J Cogn Neurosci 2010; 23:2268-79. [PMID: 20681749 DOI: 10.1162/jocn.2010.21556] [Citation(s) in RCA: 141] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The presence of irrelevant auditory information (other talkers, environmental noises) presents a major challenge to listening to speech. The fundamental frequency (F(0)) of the target speaker is thought to provide an important cue for the extraction of the speaker's voice from background noise, but little is known about the relationship between speech-in-noise (SIN) perceptual ability and neural encoding of the F(0). Motivated by recent findings that music and language experience enhance brainstem representation of sound, we examined the hypothesis that brainstem encoding of the F(0) is diminished to a greater degree by background noise in people with poorer perceptual abilities in noise. To this end, we measured speech-evoked auditory brainstem responses to /da/ in quiet and two multitalker babble conditions (two-talker and six-talker) in native English-speaking young adults who ranged in their ability to perceive and recall SIN. Listeners who were poorer performers on a standardized SIN measure demonstrated greater susceptibility to the degradative effects of noise on the neural encoding of the F(0). Particularly diminished was their phase-locked activity to the fundamental frequency in the portion of the syllable known to be most vulnerable to perceptual disruption (i.e., the formant transition period). Our findings suggest that the subcortical representation of the F(0) in noise contributes to the perception of speech in noisy conditions.
Collapse
Affiliation(s)
- Judy H Song
- Auditory Neuroscience Laboratory, Northwestern University, 2240 Campus Drive, Evanston, IL 60208, USA
| | | | | | | |
Collapse
|
76
|
Petkov CI, Sutter ML. Evolutionary conservation and neuronal mechanisms of auditory perceptual restoration. Hear Res 2010; 271:54-65. [PMID: 20541597 DOI: 10.1016/j.heares.2010.05.011] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2009] [Revised: 05/14/2010] [Accepted: 05/20/2010] [Indexed: 11/26/2022]
Abstract
Auditory perceptual 'restoration' occurs when the auditory system restores an occluded or masked sound of interest. Behavioral work on auditory restoration in humans began over 50 years ago using it to model a noisy environmental scene with competing sounds. It has become clear that not only humans experience auditory restoration: restoration has been broadly conserved in many species. Behavioral studies in humans and animals provide a necessary foundation to link the insights being obtained from human EEG and fMRI to those from animal neurophysiology. The aggregate of data resulting from multiple approaches across species has begun to clarify the neuronal bases of auditory restoration. Different types of neural responses supporting restoration have been found, supportive of multiple mechanisms working within a species. Yet a general principle has emerged that responses correlated with restoration mimic the response that would have been given to the uninterrupted sound of interest. Using the same technology to study different species will help us to better harness animal models of 'auditory scene analysis' to clarify the conserved neural mechanisms shaping the perceptual organization of sound and to advance strategies to improve hearing in natural environmental settings.
Collapse
Affiliation(s)
- Christopher I Petkov
- Institute of Neuroscience, Newcastle University, Framlington Place, Newcastle upon Tyne NE24HH, United Kingdom.
| | | |
Collapse
|
77
|
Hill KT, Miller LM. Auditory attentional control and selection during cocktail party listening. Cereb Cortex 2009; 20:583-90. [PMID: 19574393 DOI: 10.1093/cercor/bhp124] [Citation(s) in RCA: 126] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In realistic auditory environments, people rely on both attentional control and attentional selection to extract intelligible signals from a cluttered background. We used functional magnetic resonance imaging to examine auditory attention to natural speech under such high processing-load conditions. Participants attended to a single talker in a group of 3, identified by the target talker's pitch or spatial location. A catch-trial design allowed us to distinguish activity due to top-down control of attention versus attentional selection of bottom-up information in both the spatial and spectral (pitch) feature domains. For attentional control, we found a left-dominant fronto-parietal network with a bias toward spatial processing in dorsal precentral sulcus and superior parietal lobule, and a bias toward pitch in inferior frontal gyrus. During selection of the talker, attention modulated activity in left intraparietal sulcus when using talker location and in bilateral but right-dominant superior temporal sulcus when using talker pitch. We argue that these networks represent the sources and targets of selective attention in rich auditory environments.
Collapse
Affiliation(s)
- Kevin T Hill
- Center for Mind and Brain, University of California Davis, Davis, CA 95618, USA
| | | |
Collapse
|
78
|
Shahin AJ, Miller LM. Multisensory integration enhances phonemic restoration. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:1744-50. [PMID: 19275331 PMCID: PMC2663900 DOI: 10.1121/1.3075576] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Phonemic restoration occurs when speech is perceived to be continuous through noisy interruptions, even when the speech signal is artificially removed from the interrupted epochs. This temporal filling-in illusion helps maintain robust comprehension in adverse environments and illustrates how contextual knowledge through the auditory modality (e.g., lexical) can improve perception. This study investigated how one important form of context, visual speech, affects phonemic restoration. The hypothesis was that audio-visual integration of speech should improve phonemic restoration, allowing the perceived continuity to span longer temporal gaps. Subjects listened to tri-syllabic words with a portion of each word replaced by white noise while watching lip-movement that was either congruent, temporally reversed (incongruent), or static. For each word, subjects judged whether the utterance sounded continuous or interrupted, where a "continuous" response indicated an illusory percept. Results showed that illusory filling-in of longer white noise durations (longer missing segments) occurred when the mouth movement was congruent with the spoken word compared to the other conditions, with no differences occurring between the static and incongruent conditions. Thus, phonemic restoration is enhanced when applying contextual knowledge through multisensory integration.
Collapse
Affiliation(s)
- Antoine J Shahin
- Center for Mind & Brain, University of California, Davis, California 95618, USA.
| | | |
Collapse
|
79
|
Desai R, Liebenthal E, Waldron E, Binder JR. Left posterior temporal regions are sensitive to auditory categorization. J Cogn Neurosci 2008; 20:1174-88. [PMID: 18284339 DOI: 10.1162/jocn.2008.20081] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Recent studies suggest that the left superior temporal gyrus and sulcus (LSTG/S) play a role in speech perception, although the precise function of these areas remains unclear. Here, we test the hypothesis that regions in the LSTG/S play a role in the categorization of speech phonemes, irrespective of the acoustic properties of the sounds and prior experience of the listener with them. We examined changes in functional magnetic resonance imaging brain activation related to a perceptual shift from nonphonetic to phonetic analysis of sine-wave speech analogs. Subjects performed an identification task before scanning and a discrimination task during scanning with phonetic (P) and nonphonetic (N) sine-wave sounds, both before (Pre) and after (Post) being exposed to the phonetic properties of the P sounds. Behaviorally, experience with the P sounds induced categorical identification of these sounds. In the PostP > PreP and PostP > PostN contrasts, an area in the posterior LSTG/S was activated. For both P and N sounds, the activation in this region was correlated with the degree of categorical identification in individual subjects. The results suggest that these areas in the posterior LSTG/S are sensitive neither to the acoustic properties of speech nor merely to the presence of phonetic information, but rather to the listener's awareness of category representations for auditory inputs.
Collapse
Affiliation(s)
- Rutvik Desai
- Department of Neurology, Medical College of Wisconsin, Milwaukee, WI 53226, USA.
| | | | | | | |
Collapse
|