1
|
Cope TE, Sohoglu E, Peterson KA, Jones PS, Rua C, Passamonti L, Sedley W, Post B, Coebergh J, Butler CR, Garrard P, Abdel-Aziz K, Husain M, Griffiths TD, Patterson K, Davis MH, Rowe JB. Temporal lobe perceptual predictions for speech are instantiated in motor cortex and reconciled by inferior frontal cortex. Cell Rep 2023; 42:112422. [PMID: 37099422 DOI: 10.1016/j.celrep.2023.112422] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 12/23/2022] [Accepted: 04/05/2023] [Indexed: 04/27/2023] Open
Abstract
Humans use predictions to improve speech perception, especially in noisy environments. Here we use 7-T functional MRI (fMRI) to decode brain representations of written phonological predictions and degraded speech signals in healthy humans and people with selective frontal neurodegeneration (non-fluent variant primary progressive aphasia [nfvPPA]). Multivariate analyses of item-specific patterns of neural activation indicate dissimilar representations of verified and violated predictions in left inferior frontal gyrus, suggestive of processing by distinct neural populations. In contrast, precentral gyrus represents a combination of phonological information and weighted prediction error. In the presence of intact temporal cortex, frontal neurodegeneration results in inflexible predictions. This manifests neurally as a failure to suppress incorrect predictions in anterior superior temporal gyrus and reduced stability of phonological representations in precentral gyrus. We propose a tripartite speech perception network in which inferior frontal gyrus supports prediction reconciliation in echoic memory, and precentral gyrus invokes a motor model to instantiate and refine perceptual predictions for speech.
Collapse
Affiliation(s)
- Thomas E Cope
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; Cambridge University Hospitals NHS Trust, Cambridge CB2 0QQ, UK.
| | - Ediz Sohoglu
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; School of Psychology, University of Sussex, Brighton BN1 9RH, UK
| | - Katie A Peterson
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Department of Radiology, University of Cambridge, Cambridge CB2 0QQ, UK
| | - P Simon Jones
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK
| | - Catarina Rua
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK
| | - Luca Passamonti
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK
| | - William Sedley
- Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Brechtje Post
- Theoretical and Applied Linguistics, Faculty of Modern & Medieval Languages & Linguistics, University of Cambridge, Cambridge CB3 9DA, UK
| | - Jan Coebergh
- Ashford and St Peter's Hospital, Ashford TW15 3AA, UK; St George's Hospital, London SW17 0QT, UK
| | - Christopher R Butler
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, UK; Faculty of Medicine, Department of Brain Sciences, Imperial College London, London W12 0NN, UK
| | - Peter Garrard
- St George's Hospital, London SW17 0QT, UK; Molecular and Clinical Sciences Research Institute, St. George's, University of London, London SW17 0RE, UK
| | - Khaled Abdel-Aziz
- Ashford and St Peter's Hospital, Ashford TW15 3AA, UK; St George's Hospital, London SW17 0QT, UK
| | - Masud Husain
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, UK
| | - Timothy D Griffiths
- Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Karalyn Patterson
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK
| | - Matthew H Davis
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK
| | - James B Rowe
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; Cambridge University Hospitals NHS Trust, Cambridge CB2 0QQ, UK
| |
Collapse
|
2
|
Wang YC, Sohoglu E, Gilbert RA, Henson RN, Davis MH. Predictive Neural Computations Support Spoken Word Recognition: Evidence from MEG and Competitor Priming. J Neurosci 2021; 41:6919-6932. [PMID: 34210777 PMCID: PMC8360690 DOI: 10.1523/jneurosci.1685-20.2021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 05/22/2021] [Accepted: 05/25/2021] [Indexed: 11/24/2022] Open
Abstract
Human listeners achieve quick and effortless speech comprehension through computations of conditional probability using Bayes rule. However, the neural implementation of Bayesian perceptual inference remains unclear. Competitive-selection accounts (e.g., TRACE) propose that word recognition is achieved through direct inhibitory connections between units representing candidate words that share segments (e.g., hygiene and hijack share /haidʒ/). Manipulations that increase lexical uncertainty should increase neural responses associated with word recognition when words cannot be uniquely identified. In contrast, predictive-selection accounts (e.g., Predictive-Coding) propose that spoken word recognition involves comparing heard and predicted speech sounds and using prediction error to update lexical representations. Increased lexical uncertainty in words, such as hygiene and hijack, will increase prediction error and hence neural activity only at later time points when different segments are predicted. We collected MEG data from male and female listeners to test these two Bayesian mechanisms and used a competitor priming manipulation to change the prior probability of specific words. Lexical decision responses showed delayed recognition of target words (hygiene) following presentation of a neighboring prime word (hijack) several minutes earlier. However, this effect was not observed with pseudoword primes (higent) or targets (hijure). Crucially, MEG responses in the STG showed greater neural responses for word-primed words after the point at which they were uniquely identified (after /haidʒ/ in hygiene) but not before while similar changes were again absent for pseudowords. These findings are consistent with accounts of spoken word recognition in which neural computations of prediction error play a central role.SIGNIFICANCE STATEMENT Effective speech perception is critical to daily life and involves computations that combine speech signals with prior knowledge of spoken words (i.e., Bayesian perceptual inference). This study specifies the neural mechanisms that support spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Most established theories propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words. Our results instead support predictive-selection theories (e.g., Predictive-Coding): by comparing heard and predicted speech sounds, neural computations of prediction error can help listeners continuously update lexical probabilities, allowing for more rapid word identification.
Collapse
Affiliation(s)
- Yingcan Carol Wang
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Ediz Sohoglu
- School of Psychology, University of Sussex, Brighton, BN1 9RH, United Kingdom
| | - Rebecca A Gilbert
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Richard N Henson
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| |
Collapse
|
3
|
van Bree S, Sohoglu E, Davis MH, Zoefel B. Sustained neural rhythms reveal endogenous oscillations supporting speech perception. PLoS Biol 2021; 19:e3001142. [PMID: 33635855 PMCID: PMC7946281 DOI: 10.1371/journal.pbio.3001142] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 03/10/2021] [Accepted: 02/08/2021] [Indexed: 12/23/2022] Open
Abstract
Rhythmic sensory or electrical stimulation will produce rhythmic brain responses. These rhythmic responses are often interpreted as endogenous neural oscillations aligned (or "entrained") to the stimulus rhythm. However, stimulus-aligned brain responses can also be explained as a sequence of evoked responses, which only appear regular due to the rhythmicity of the stimulus, without necessarily involving underlying neural oscillations. To distinguish evoked responses from true oscillatory activity, we tested whether rhythmic stimulation produces oscillatory responses which continue after the end of the stimulus. Such sustained effects provide evidence for true involvement of neural oscillations. In Experiment 1, we found that rhythmic intelligible, but not unintelligible speech produces oscillatory responses in magnetoencephalography (MEG) which outlast the stimulus at parietal sensors. In Experiment 2, we found that transcranial alternating current stimulation (tACS) leads to rhythmic fluctuations in speech perception outcomes after the end of electrical stimulation. We further report that the phase relation between electroencephalography (EEG) responses and rhythmic intelligible speech can predict the tACS phase that leads to most accurate speech perception. Together, we provide fundamental results for several lines of research-including neural entrainment and tACS-and reveal endogenous neural oscillations as a key underlying principle for speech perception.
Collapse
Affiliation(s)
- Sander van Bree
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Centre for Cognitive Neuroimaging, University of Glasgow, Glasgow, United Kingdom
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Birmingham, United Kingdom
| | - Ediz Sohoglu
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- School of Psychology, University of Sussex, Brighton, United Kingdom
| | - Matthew H. Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Benedikt Zoefel
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Centre de Recherche Cerveau et Cognition, CNRS, Toulouse, France
- Université Toulouse III Paul Sabatier, Toulouse, France
| |
Collapse
|
4
|
Abstract
Human speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We used magnetoencephalographic recordings of brain responses to degraded spoken words and experimentally manipulated signal quality and prior knowledge. We first demonstrate that spectrotemporal modulations in speech are more strongly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). Critically, we found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and is apparent in neural responses within 100 ms of speech input. Our findings contribute to the detailed specification of a computational model of speech perception based on predictive coding frameworks.
Collapse
Affiliation(s)
- Ediz Sohoglu
- School of Psychology, University of SussexBrightonUnited Kingdom
| | - Matthew H Davis
- MRC Cognition and Brain Sciences UnitCambridgeUnited Kingdom
| |
Collapse
|
5
|
Abstract
What is the nature of the neural code by which the human brain represents spoken language? New research suggests that previous findings of a language-specific code in cortical responses to speech can be explained solely by simple acoustic features.
Collapse
Affiliation(s)
- Ediz Sohoglu
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK.
| |
Collapse
|
6
|
MacGregor LJ, Rodd JM, Gilbert RA, Hauk O, Sohoglu E, Davis MH. The Neural Time Course of Semantic Ambiguity Resolution in Speech Comprehension. J Cogn Neurosci 2020; 32:403-425. [PMID: 31682564 PMCID: PMC7116495 DOI: 10.1162/jocn_a_01493] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Semantically ambiguous words challenge speech comprehension, particularly when listeners must select a less frequent (subordinate) meaning at disambiguation. Using combined magnetoencephalography (MEG) and EEG, we measured neural responses associated with distinct cognitive operations during semantic ambiguity resolution in spoken sentences: (i) initial activation and selection of meanings in response to an ambiguous word and (ii) sentence reinterpretation in response to subsequent disambiguation to a subordinate meaning. Ambiguous words elicited an increased neural response approximately 400-800 msec after their acoustic offset compared with unambiguous control words in left frontotemporal MEG sensors, corresponding to sources in bilateral frontotemporal brain regions. This response may reflect increased demands on processes by which multiple alternative meanings are activated and maintained until later selection. Disambiguating words heard after an ambiguous word were associated with marginally increased neural activity over bilateral temporal MEG sensors and a central cluster of EEG electrodes, which localized to similar bilateral frontal and left temporal regions. This later neural response may reflect effortful semantic integration or elicitation of prediction errors that guide reinterpretation of previously selected word meanings. Across participants, the amplitude of the ambiguity response showed a marginal positive correlation with comprehension scores, suggesting that sentence comprehension benefits from additional processing around the time of an ambiguous word. Better comprehenders may have increased availability of subordinate meanings, perhaps due to higher quality lexical representations and reflected in a positive correlation between vocabulary size and comprehension success.
Collapse
Affiliation(s)
| | - Jennifer M. Rodd
- Department of Experimental Psychology, University College London
| | | | - Olaf Hauk
- MRC Cognition and Brain Sciences Unit, University of Cambridge
| | - Ediz Sohoglu
- MRC Cognition and Brain Sciences Unit, University of Cambridge
| | | |
Collapse
|
7
|
Sohoglu E, Kumar S, Chait M, Griffiths TD. Multivoxel codes for representing and integrating acoustic features in human cortex. Neuroimage 2020; 217:116661. [PMID: 32081785 PMCID: PMC7339141 DOI: 10.1016/j.neuroimage.2020.116661] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 02/13/2020] [Accepted: 02/15/2020] [Indexed: 10/25/2022] Open
Abstract
Using fMRI and multivariate pattern analysis, we determined whether spectral and temporal acoustic features are represented by independent or integrated multivoxel codes in human cortex. Listeners heard band-pass noise varying in frequency (spectral) and amplitude-modulation (AM) rate (temporal) features. In the superior temporal plane, changes in multivoxel activity due to frequency were largely invariant with respect to AM rate (and vice versa), consistent with an independent representation. In contrast, in posterior parietal cortex, multivoxel representation was exclusively integrated and tuned to specific conjunctions of frequency and AM features (albeit weakly). Direct between-region comparisons show that whereas independent coding of frequency weakened with increasing levels of the hierarchy, such a progression for AM and integrated coding was less fine-grained and only evident in the higher hierarchical levels from non-core to parietal cortex (with AM coding weakening and integrated coding strengthening). Our findings support the notion that primary auditory cortex can represent spectral and temporal acoustic features in an independent fashion and suggest a role for parietal cortex in feature integration and the structuring of sensory input.
Collapse
Affiliation(s)
- Ediz Sohoglu
- School of Psychology, University of Sussex, Brighton, BN1 9QH, United Kingdom.
| | - Sukhbinder Kumar
- Institute of Neurobiology, Medical School, Newcastle University, Newcastle Upon Tyne, NE2 4HH, United Kingdom; Wellcome Trust Centre for Human Neuroimaging, University College London, London, WC1N 3BG, United Kingdom
| | - Maria Chait
- Ear Institute, University College London, London, United Kingdom
| | - Timothy D Griffiths
- Institute of Neurobiology, Medical School, Newcastle University, Newcastle Upon Tyne, NE2 4HH, United Kingdom; Wellcome Trust Centre for Human Neuroimaging, University College London, London, WC1N 3BG, United Kingdom
| |
Collapse
|
8
|
Casaponsa A, Sohoglu E, Moore DR, Füllgrabe C, Molloy K, Amitay S. Does training with amplitude modulated tones affect tone-vocoded speech perception? PLoS One 2019; 14:e0226288. [PMID: 31881550 PMCID: PMC6934405 DOI: 10.1371/journal.pone.0226288] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 11/22/2019] [Indexed: 11/17/2022] Open
Abstract
Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored.
Collapse
Affiliation(s)
- Aina Casaponsa
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
- Department of Linguistics and English Language, Lancaster University, Lancaster, England, United Kingdom
| | - Ediz Sohoglu
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - David R. Moore
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Christian Füllgrabe
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Katharine Molloy
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Sygal Amitay
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| |
Collapse
|
9
|
Cope TE, Sohoglu E, Sedley W, Patterson K, Jones PS, Wiggins J, Dawson C, Grube M, Carlyon RP, Griffiths TD, Davis MH, Rowe JB. Evidence for causal top-down frontal contributions to predictive processes in speech perception. Nat Commun 2017; 8:2154. [PMID: 29255275 PMCID: PMC5735133 DOI: 10.1038/s41467-017-01958-7] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 10/27/2017] [Indexed: 11/09/2022] Open
Abstract
Perception relies on the integration of sensory information and prior expectations. Here we show that selective neurodegeneration of human frontal speech regions results in delayed reconciliation of predictions in temporal cortex. These temporal regions were not atrophic, displayed normal evoked magnetic and electrical power, and preserved neural sensitivity to manipulations of sensory detail. Frontal neurodegeneration does not prevent the perceptual effects of contextual information; instead, prior expectations are applied inflexibly. The precision of predictions correlates with beta power, in line with theoretical models of the neural instantiation of predictive coding. Fronto-temporal interactions are enhanced while participants reconcile prior predictions with degraded sensory signals. Excessively precise predictions can explain several challenging phenomena in frontal aphasias, including agrammatism and subjective difficulties with speech perception. This work demonstrates that higher-level frontal mechanisms for cognitive and behavioural flexibility make a causal functional contribution to the hierarchical generative models underlying speech perception.
Collapse
Affiliation(s)
- Thomas E Cope
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK.
| | - E Sohoglu
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| | - W Sedley
- Institute of Neuroscience, Newcastle University, Newcastle, NE1 7RU, UK
| | - K Patterson
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| | - P S Jones
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK
| | - J Wiggins
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK
| | - C Dawson
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK
| | - M Grube
- Institute of Neuroscience, Newcastle University, Newcastle, NE1 7RU, UK
| | - R P Carlyon
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| | - T D Griffiths
- Institute of Neuroscience, Newcastle University, Newcastle, NE1 7RU, UK
| | - Matthew H Davis
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| | - James B Rowe
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| |
Collapse
|
10
|
Cope T, Sohoglu E, Patterson K, Dawson C, Grube M, Sedley W, Davis M, Rowe J. MEG REVEALS SPEECH PROCESSING DELAY IN PROGRESSIVE NON FLUENT APHASIA. J Neurol Psychiatry 2016. [DOI: 10.1136/jnnp-2016-315106.165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
11
|
Abstract
We use psychophysics and MEG to test how sensitivity to input statistics facilitates auditory-scene-analysis (ASA). Human subjects listened to ‘scenes’ comprised of concurrent tone-pip streams (sources). On occasional trials a new source appeared partway. Listeners were more accurate and quicker to detect source appearance in scenes comprised of temporally-regular (REG), rather than random (RAND), sources. MEG in passive listeners and those actively detecting appearance events revealed increased sustained activity in auditory and parietal cortex in REG relative to RAND scenes, emerging ~400 ms of scene-onset. Over and above this, appearance in REG scenes was associated with increased responses relative to RAND scenes. The effect of temporal structure on appearance-evoked responses was delayed when listeners were focused on the scenes relative to when listening passively, consistent with the notion that attention reduces ‘surprise’. Overall, the results implicate a mechanism that tracks predictability of multiple concurrent sources to facilitate active and passive ASA. DOI:http://dx.doi.org/10.7554/eLife.19113.001 Everyday environments like a busy street bombard our ears with information. Yet most of the time, the human brain quickly and effortlessly makes sense of this information in a process known as auditory scene analysis. According to one popular theory, the brain is particularly sensitive to regularly repeating features in sensory signals, and uses those regularities to guide scene analysis. Indeed, many biological sounds contain such regularities, like the pitter-patter of footsteps or the fluttering of bird wings. In most previous studies that investigated whether regularity guides auditory scene analysis in humans, listeners attended to one sound stream that repeated slowly. Thus, it was unclear how regularity might benefit scene analysis in more realistic settings that feature many sounds that quickly change over time. Sohoglu and Chait presented listeners with cluttered, artificial auditory scenes comprised of several sources of sound. If the scenes contained regularly repeating sound sources, the listeners were better able to detect new sounds that appeared partway through the scenes. This shows that auditory scene analysis benefits from sound regularity. To understand the neurobiological basis of this effect, Sohoglu and Chait also recorded the brain activity of the listeners using a non-invasive technique called magnetoencephalography. This activity increased when the sound scenes featured regularly repeating sounds. It therefore appears that the brain prioritized the repeating sounds, and this improved the ability of the listeners to detect new sound sources. When the listeners actively focused on listening to the regular sounds, their brain response to new sounds occurred later than seen in volunteers who were not actively listening to the scene. This was unexpected as delayed brain responses are not usually associated with active focusing. However, this effect can be explained if active focusing increases the expectation of new sounds appearing, because previous research has shown that expectation reduces brain responses. The experiments performed by Sohoglu and Chait used a relatively simple form of sound regularity (tone pips repeating at equal time intervals). Future work will investigate more complex forms of regularity to understand the kinds of sensory patterns to which the brain is sensitive. DOI:http://dx.doi.org/10.7554/eLife.19113.002
Collapse
Affiliation(s)
- Ediz Sohoglu
- UCL Ear Institute, University College London, London, United Kingdom
| | - Maria Chait
- UCL Ear Institute, University College London, London, United Kingdom
| |
Collapse
|
12
|
Abstract
Human perception is shaped by past experience on multiple timescales. Sudden and dramatic changes in perception occur when prior knowledge or expectations match stimulus content. These immediate effects contrast with the longer-term, more gradual improvements that are characteristic of perceptual learning. Despite extensive investigation of these two experience-dependent phenomena, there is considerable debate about whether they result from common or dissociable neural mechanisms. Here we test single- and dual-mechanism accounts of experience-dependent changes in perception using concurrent magnetoencephalographic and EEG recordings of neural responses evoked by degraded speech. When speech clarity was enhanced by prior knowledge obtained from matching text, we observed reduced neural activity in a peri-auditory region of the superior temporal gyrus (STG). Critically, longer-term improvements in the accuracy of speech recognition following perceptual learning resulted in reduced activity in a nearly identical STG region. Moreover, short-term neural changes caused by prior knowledge and longer-term neural changes arising from perceptual learning were correlated across subjects with the magnitude of learning-induced changes in recognition accuracy. These experience-dependent effects on neural processing could be dissociated from the neural effect of hearing physically clearer speech, which similarly enhanced perception but increased rather than decreased STG responses. Hence, the observed neural effects of prior knowledge and perceptual learning cannot be attributed to epiphenomenal changes in listening effort that accompany enhanced perception. Instead, our results support a predictive coding account of speech perception; computational simulations show how a single mechanism, minimization of prediction error, can drive immediate perceptual effects of prior knowledge and longer-term perceptual learning of degraded speech.
Collapse
Affiliation(s)
- Ediz Sohoglu
- Medical Research Council Cognition and Brain Sciences Unit, Cambridge CB2 7EF, United Kingdom
| | - Matthew H Davis
- Medical Research Council Cognition and Brain Sciences Unit, Cambridge CB2 7EF, United Kingdom
| |
Collapse
|
13
|
Abstract
Two key questions concerning change detection in crowded acoustic environments are the extent to which cortical processing is specialized for different forms of acoustic change and when in the time-course of cortical processing neural activity becomes predictive of behavioral outcomes. Here, we address these issues by using magnetoencephalography (MEG) to probe the cortical dynamics of change detection in ongoing acoustic scenes containing as many as ten concurrent sources. Each source was formed of a sequence of tone pips with a unique carrier frequency and temporal modulation pattern, designed to mimic the spectrotemporal structure of natural sounds. Our results show that listeners are more accurate and quicker to detect the appearance (than disappearance) of an auditory source in the ongoing scene. Underpinning this behavioral asymmetry are change-evoked responses differing not only in magnitude and latency, but also in their spatial patterns. We find that even the earliest (~50 ms) cortical response to change is predictive of behavioral outcomes (detection times), consistent with the hypothesized role of local neural transients in supporting change detection.
Collapse
Affiliation(s)
- Ediz Sohoglu
- UCL Ear Institute, 332 Gray's Inn Road, London WC1X 8EE, UK.
| | - Maria Chait
- UCL Ear Institute, 332 Gray's Inn Road, London WC1X 8EE, UK.
| |
Collapse
|
14
|
Abstract
An unresolved question is how the reported clarity of degraded speech is enhanced when listeners have prior knowledge of speech content. One account of this phenomenon proposes top-down modulation of early acoustic processing by higher-level linguistic knowledge. Alternative, strictly bottom-up accounts argue that acoustic information and higher-level knowledge are combined at a late decision stage without modulating early acoustic processing. Here we tested top-down and bottom-up accounts using written text to manipulate listeners’ knowledge of speech content. The effect of written text on the reported clarity of noise-vocoded speech was most pronounced when text was presented before (rather than after) speech (Experiment 1). Fine-grained manipulation of the onset asynchrony between text and speech revealed that this effect declined when text was presented more than 120 ms after speech onset (Experiment 2). Finally, the influence of written text was found to arise from phonological (rather than lexical) correspondence between text and speech (Experiment 3). These results suggest that prior knowledge effects are time-limited by the duration of auditory echoic memory for degraded speech, consistent with top-down modulation of early acoustic processing by linguistic knowledge.
Collapse
|
15
|
Molloy K, Moore DR, Sohoglu E, Amitay S. Less is more: latent learning is maximized by shorter training sessions in auditory perceptual learning. PLoS One 2012; 7:e36929. [PMID: 22606309 PMCID: PMC3351401 DOI: 10.1371/journal.pone.0036929] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2012] [Accepted: 04/17/2012] [Indexed: 11/18/2022] Open
Abstract
Background The time course and outcome of perceptual learning can be affected by the length and distribution of practice, but the training regimen parameters that govern these effects have received little systematic study in the auditory domain. We asked whether there was a minimum requirement on the number of trials within a training session for learning to occur, whether there was a maximum limit beyond which additional trials became ineffective, and whether multiple training sessions provided benefit over a single session. Methodology/Principal Findings We investigated the efficacy of different regimens that varied in the distribution of practice across training sessions and in the overall amount of practice received on a frequency discrimination task. While learning was relatively robust to variations in regimen, the group with the shortest training sessions (∼8 min) had significantly faster learning in early stages of training than groups with longer sessions. In later stages, the group with the longest training sessions (>1 hr) showed slower learning than the other groups, suggesting overtraining. Between-session improvements were inversely correlated with performance; they were largest at the start of training and reduced as training progressed. In a second experiment we found no additional longer-term improvement in performance, retention, or transfer of learning for a group that trained over 4 sessions (∼4 hr in total) relative to a group that trained for a single session (∼1 hr). However, the mechanisms of learning differed; the single-session group continued to improve in the days following cessation of training, whereas the multi-session group showed no further improvement once training had ceased. Conclusions/Significance Shorter training sessions were advantageous because they allowed for more latent, between-session and post-training learning to emerge. These findings suggest that efficient regimens should use short training sessions, and optimized spacing between sessions.
Collapse
Affiliation(s)
- Katharine Molloy
- Medical Research Council Institute of Hearing Research, Nottingham, United Kingdom
| | - David R. Moore
- Medical Research Council Institute of Hearing Research, Nottingham, United Kingdom
| | - Ediz Sohoglu
- Medical Research Council Institute of Hearing Research, Nottingham, United Kingdom
| | - Sygal Amitay
- Medical Research Council Institute of Hearing Research, Nottingham, United Kingdom
- * E-mail:
| |
Collapse
|