1
|
Crinnion AM, Luthra S, Gaston P, Magnuson JS. Resolving competing predictions in speech: How qualitatively different cues and cue reliability contribute to phoneme identification. Atten Percept Psychophys 2024; 86:942-961. [PMID: 38383914 PMCID: PMC11233028 DOI: 10.3758/s13414-024-02849-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 02/23/2024]
Abstract
Listeners have many sources of information available in interpreting speech. Numerous theoretical frameworks and paradigms have established that various constraints impact the processing of speech sounds, but it remains unclear how listeners might simultaneously consider multiple cues, especially those that differ qualitatively (i.e., with respect to timing and/or modality) or quantitatively (i.e., with respect to cue reliability). Here, we establish that cross-modal identity priming can influence the interpretation of ambiguous phonemes (Exp. 1, N = 40) and show that two qualitatively distinct cues - namely, cross-modal identity priming and auditory co-articulatory context - have additive effects on phoneme identification (Exp. 2, N = 40). However, we find no effect of quantitative variation in a cue - specifically, changes in the reliability of the priming cue did not influence phoneme identification (Exp. 3a, N = 40; Exp. 3b, N = 40). Overall, we find that qualitatively distinct cues can additively influence phoneme identification. While many existing theoretical frameworks address constraint integration to some degree, our results provide a step towards understanding how information that differs in both timing and modality is integrated in online speech perception.
Collapse
Affiliation(s)
| | | | | | - James S Magnuson
- University of Connecticut, Storrs, CT, USA
- BCBL. Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
- Ikerbasque. Basque Foundation for Science, Bilbao, Spain
| |
Collapse
|
2
|
Tolkacheva V, Brownsett SLE, McMahon KL, de Zubicaray GI. Perceiving and misperceiving speech: lexical and sublexical processing in the superior temporal lobes. Cereb Cortex 2024; 34:bhae087. [PMID: 38494418 PMCID: PMC10944697 DOI: 10.1093/cercor/bhae087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/15/2024] [Accepted: 02/16/2024] [Indexed: 03/19/2024] Open
Abstract
Listeners can use prior knowledge to predict the content of noisy speech signals, enhancing perception. However, this process can also elicit misperceptions. For the first time, we employed a prime-probe paradigm and transcranial magnetic stimulation to investigate causal roles for the left and right posterior superior temporal gyri (pSTG) in the perception and misperception of degraded speech. Listeners were presented with spectrotemporally degraded probe sentences preceded by a clear prime. To produce misperceptions, we created partially mismatched pseudo-sentence probes via homophonic nonword transformations (e.g. The little girl was excited to lose her first tooth-Tha fittle girmn wam expited du roos har derst cooth). Compared to a control site (vertex), inhibitory stimulation of the left pSTG selectively disrupted priming of real but not pseudo-sentences. Conversely, inhibitory stimulation of the right pSTG enhanced priming of misperceptions with pseudo-sentences, but did not influence perception of real sentences. These results indicate qualitatively different causal roles for the left and right pSTG in perceiving degraded speech, supporting bilateral models that propose engagement of the right pSTG in sublexical processing.
Collapse
Affiliation(s)
- Valeriya Tolkacheva
- Queensland University of Technology, School of Psychology and Counselling, O Block, Kelvin Grove, Queensland, 4059, Australia
| | - Sonia L E Brownsett
- Queensland Aphasia Research Centre, School of Health and Rehabilitation Sciences, University of Queensland, Surgical Treatment and Rehabilitation Services, Herston, Queensland, 4006, Australia
- Centre of Research Excellence in Aphasia Recovery and Rehabilitation, La Trobe University, Melbourne, Health Sciences Building 1, 1 Kingsbury Drive, Bundoora, Victoria, 3086, Australia
| | - Katie L McMahon
- Herston Imaging Research Facility, Royal Brisbane & Women’s Hospital, Building 71/918, Royal Brisbane & Women’s Hospital, Herston, Queensland, 4006, Australia
- Queensland University of Technology, School of Clinical Sciences and Centre for Biomedical Technologies, 60 Musk Avenue, Kelvin Grove, Queensland, 4059, Australia
| | - Greig I de Zubicaray
- Queensland University of Technology, School of Psychology and Counselling, O Block, Kelvin Grove, Queensland, 4059, Australia
| |
Collapse
|
3
|
Cope TE, Sohoglu E, Peterson KA, Jones PS, Rua C, Passamonti L, Sedley W, Post B, Coebergh J, Butler CR, Garrard P, Abdel-Aziz K, Husain M, Griffiths TD, Patterson K, Davis MH, Rowe JB. Temporal lobe perceptual predictions for speech are instantiated in motor cortex and reconciled by inferior frontal cortex. Cell Rep 2023; 42:112422. [PMID: 37099422 DOI: 10.1016/j.celrep.2023.112422] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 12/23/2022] [Accepted: 04/05/2023] [Indexed: 04/27/2023] Open
Abstract
Humans use predictions to improve speech perception, especially in noisy environments. Here we use 7-T functional MRI (fMRI) to decode brain representations of written phonological predictions and degraded speech signals in healthy humans and people with selective frontal neurodegeneration (non-fluent variant primary progressive aphasia [nfvPPA]). Multivariate analyses of item-specific patterns of neural activation indicate dissimilar representations of verified and violated predictions in left inferior frontal gyrus, suggestive of processing by distinct neural populations. In contrast, precentral gyrus represents a combination of phonological information and weighted prediction error. In the presence of intact temporal cortex, frontal neurodegeneration results in inflexible predictions. This manifests neurally as a failure to suppress incorrect predictions in anterior superior temporal gyrus and reduced stability of phonological representations in precentral gyrus. We propose a tripartite speech perception network in which inferior frontal gyrus supports prediction reconciliation in echoic memory, and precentral gyrus invokes a motor model to instantiate and refine perceptual predictions for speech.
Collapse
Affiliation(s)
- Thomas E Cope
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; Cambridge University Hospitals NHS Trust, Cambridge CB2 0QQ, UK.
| | - Ediz Sohoglu
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; School of Psychology, University of Sussex, Brighton BN1 9RH, UK
| | - Katie A Peterson
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Department of Radiology, University of Cambridge, Cambridge CB2 0QQ, UK
| | - P Simon Jones
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK
| | - Catarina Rua
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK
| | - Luca Passamonti
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK
| | - William Sedley
- Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Brechtje Post
- Theoretical and Applied Linguistics, Faculty of Modern & Medieval Languages & Linguistics, University of Cambridge, Cambridge CB3 9DA, UK
| | - Jan Coebergh
- Ashford and St Peter's Hospital, Ashford TW15 3AA, UK; St George's Hospital, London SW17 0QT, UK
| | - Christopher R Butler
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, UK; Faculty of Medicine, Department of Brain Sciences, Imperial College London, London W12 0NN, UK
| | - Peter Garrard
- St George's Hospital, London SW17 0QT, UK; Molecular and Clinical Sciences Research Institute, St. George's, University of London, London SW17 0RE, UK
| | - Khaled Abdel-Aziz
- Ashford and St Peter's Hospital, Ashford TW15 3AA, UK; St George's Hospital, London SW17 0QT, UK
| | - Masud Husain
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford OX3 9DU, UK
| | - Timothy D Griffiths
- Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Karalyn Patterson
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK
| | - Matthew H Davis
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK
| | - James B Rowe
- Department of Clinical Neurosciences, University of Cambridge, Cambridge CB2 0SZ, UK; Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK; Cambridge University Hospitals NHS Trust, Cambridge CB2 0QQ, UK
| |
Collapse
|
4
|
Rovetti J, Sumantry D, Russo FA. Exposure to nonnative-accented speech reduces listening effort and improves social judgments of the speaker. Sci Rep 2023; 13:2808. [PMID: 36797318 PMCID: PMC9935874 DOI: 10.1038/s41598-023-29082-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 01/30/2023] [Indexed: 02/18/2023] Open
Abstract
Prior research has revealed a native-accent advantage, whereby nonnative-accented speech is more difficult to process than native-accented speech. Nonnative-accented speakers also experience more negative social judgments. In the current study, we asked three questions. First, does exposure to nonnative-accented speech increase speech intelligibility or decrease listening effort, thereby narrowing the native-accent advantage? Second, does lower intelligibility or higher listening effort contribute to listeners' negative social judgments of speakers? Third and finally, does increased intelligibility or decreased listening effort with exposure to speech bring about more positive social judgments of speakers? To address these questions, normal-hearing adults listened to a block of English sentences with a native accent and a block with nonnative accent. We found that once participants were accustomed to the task, intelligibility was greater for nonnative-accented speech and increased similarly with exposure for both accents. However, listening effort decreased only for nonnative-accented speech, soon reaching the level of native-accented speech. In addition, lower intelligibility and higher listening effort was associated with lower ratings of speaker warmth, speaker competence, and willingness to interact with the speaker. Finally, competence ratings increased over time to a similar extent for both accents, with this relationship fully mediated by intelligibility and listening effort. These results offer insight into how listeners process and judge unfamiliar speakers.
Collapse
Affiliation(s)
- Joseph Rovetti
- grid.39381.300000 0004 1936 8884Department of Psychology, Western University, London, ON N6A 3K7 Canada ,Department of Psychology, Toronto Metropolitan University, Toronto, ON M5B 2K3 Canada
| | - David Sumantry
- Department of Psychology, Toronto Metropolitan University, Toronto, ON M5B 2K3 Canada
| | - Frank A. Russo
- Department of Psychology, Toronto Metropolitan University, Toronto, ON M5B 2K3 Canada
| |
Collapse
|
5
|
Wang H, Chen R, Yan Y, McGettigan C, Rosen S, Adank P. Perceptual Learning of Noise-Vocoded Speech Under Divided Attention. Trends Hear 2023; 27:23312165231192297. [PMID: 37547940 PMCID: PMC10408355 DOI: 10.1177/23312165231192297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 08/08/2023] Open
Abstract
Speech perception performance for degraded speech can improve with practice or exposure. Such perceptual learning is thought to be reliant on attention and theoretical accounts like the predictive coding framework suggest a key role for attention in supporting learning. However, it is unclear whether speech perceptual learning requires undivided attention. We evaluated the role of divided attention in speech perceptual learning in two online experiments (N = 336). Experiment 1 tested the reliance of perceptual learning on undivided attention. Participants completed a speech recognition task where they repeated forty noise-vocoded sentences in a between-group design. Participants performed the speech task alone or concurrently with a domain-general visual task (dual task) at one of three difficulty levels. We observed perceptual learning under divided attention for all four groups, moderated by dual-task difficulty. Listeners in easy and intermediate visual conditions improved as much as the single-task group. Those who completed the most challenging visual task showed faster learning and achieved similar ending performance compared to the single-task group. Experiment 2 tested whether learning relies on domain-specific or domain-general processes. Participants completed a single speech task or performed this task together with a dual task aiming to recruit domain-specific (lexical or phonological), or domain-general (visual) processes. All secondary task conditions produced patterns and amount of learning comparable to the single speech task. Our results demonstrate that the impact of divided attention on perceptual learning is not strictly dependent on domain-general or domain-specific processes and speech perceptual learning persists under divided attention.
Collapse
Affiliation(s)
- Han Wang
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Rongru Chen
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Yu Yan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Carolyn McGettigan
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Stuart Rosen
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Patti Adank
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| |
Collapse
|
6
|
MacGregor LJ, Gilbert RA, Balewski Z, Mitchell DJ, Erzinçlioğlu SW, Rodd JM, Duncan J, Fedorenko E, Davis MH. Causal Contributions of the Domain-General (Multiple Demand) and the Language-Selective Brain Networks to Perceptual and Semantic Challenges in Speech Comprehension. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2022; 3:665-698. [PMID: 36742011 PMCID: PMC9893226 DOI: 10.1162/nol_a_00081] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 09/07/2022] [Indexed: 06/18/2023]
Abstract
Listening to spoken language engages domain-general multiple demand (MD; frontoparietal) regions of the human brain, in addition to domain-selective (frontotemporal) language regions, particularly when comprehension is challenging. However, there is limited evidence that the MD network makes a functional contribution to core aspects of understanding language. In a behavioural study of volunteers (n = 19) with chronic brain lesions, but without aphasia, we assessed the causal role of these networks in perceiving, comprehending, and adapting to spoken sentences made more challenging by acoustic-degradation or lexico-semantic ambiguity. We measured perception of and adaptation to acoustically degraded (noise-vocoded) sentences with a word report task before and after training. Participants with greater damage to MD but not language regions required more vocoder channels to achieve 50% word report, indicating impaired perception. Perception improved following training, reflecting adaptation to acoustic degradation, but adaptation was unrelated to lesion location or extent. Comprehension of spoken sentences with semantically ambiguous words was measured with a sentence coherence judgement task. Accuracy was high and unaffected by lesion location or extent. Adaptation to semantic ambiguity was measured in a subsequent word association task, which showed that availability of lower-frequency meanings of ambiguous words increased following their comprehension (word-meaning priming). Word-meaning priming was reduced for participants with greater damage to language but not MD regions. Language and MD networks make dissociable contributions to challenging speech comprehension: Using recent experience to update word meaning preferences depends on language-selective regions, whereas the domain-general MD network plays a causal role in reporting words from degraded speech.
Collapse
Affiliation(s)
- Lucy J. MacGregor
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Rebecca A. Gilbert
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Zuzanna Balewski
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA
| | - Daniel J. Mitchell
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | | | - Jennifer M. Rodd
- Psychology and Language Sciences, University College London, London, UK
| | - John Duncan
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA
| | - Matthew H. Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
7
|
Fogerty D, Madorskiy R, Vickery B, Shafiro V. Recognition of Interrupted Speech, Text, and Text-Supplemented Speech by Older Adults: Effect of Interruption Rate. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:4404-4416. [PMID: 36251884 PMCID: PMC9940893 DOI: 10.1044/2022_jslhr-22-00247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/18/2022] [Accepted: 07/18/2022] [Indexed: 05/03/2023]
Abstract
PURPOSE Studies of speech and text interruption indicate that the interruption rate influences the perceptual information available, from whole words at slow rates to subphonemic cues at faster interruptions rates. In young adults, the benefit obtained from text supplementation of speech may depend on the type of perceptual information available in either modality. Age commonly reduces temporal aspects of information processing, which may influence the benefit older adults obtain from text-supplemented speech across interruption rates. METHOD Older adults were tested unimodally and multimodally with spoken and printed sentences that were interrupted by silence or white space at various rates. RESULTS Results demonstrate U-shaped performance-rate functions for all modality conditions, with minimal performance around interruption rates of 2-4 Hz. Comparison to previous studies with younger adults indicates overall poorer recognition for interrupted materials by the older adults. However, as a group, older adults can integrate information between the two modalities to a similar degree as younger adults. Individual differences in multimodal integration were noted. CONCLUSION Overall, these results indicate that older adults, while demonstrating poorer overall performance in comparison to younger adults, successfully combine distributed partial information across speech and text modalities to facilitate sentence recognition.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
| | - Rachel Madorskiy
- Department of Speech, Language, Hearing, and Occupational Sciences, University of Montana, Missoula
| | - Blythe Vickery
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia
| | - Valeriy Shafiro
- Department of Communication Disorders and Sciences, Rush University Medical Center, Chicago, IL
| |
Collapse
|
8
|
Cooking through perceptual disfluencies: The effects of auditory and visual distortions on predicted and actual memory performance. Mem Cognit 2022; 51:862-874. [PMID: 36376621 DOI: 10.3758/s13421-022-01370-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/23/2022] [Indexed: 11/16/2022]
Abstract
The current study investigated the joint contribution of visual and auditory disfluencies, or distortions, to actual and predicted memory performance with naturalistic, multi-modal materials through three experiments. In Experiments 1 and 2, participants watched food recipe clips containing visual and auditory information that were either fully intact or else distorted in one or both of the two modalities. They were asked to remember these for a later memory test and made memory predictions after each clip. Participants produced lower memory predictions for distorted auditory and visual information than intact ones. However, these perceptual distortions revealed no actual memory differences across encoding conditions, expanding the metacognitive illusion of perceptual disfluency for static, single-word materials to naturalistic, dynamic, multi-modal materials. Experiment 3 provided naïve participants with a hypothetical scenario about the experimental paradigm used in Experiment 1, revealing lower memory predictions for distorted than intact information in both modalities. Theoretically, these results imply that both in-the-moment experiences and a priori beliefs may contribute to the perceptual disfluency illusion. From an applied perspective, the study suggests that when audio-visual distortions occur, individuals might use this information to predict their memory performance, even when it does not factor into actual memory performance.
Collapse
|
9
|
Aller M, Økland HS, MacGregor LJ, Blank H, Davis MH. Differential Auditory and Visual Phase-Locking Are Observed during Audio-Visual Benefit and Silent Lip-Reading for Speech Perception. J Neurosci 2022; 42:6108-6120. [PMID: 35760528 PMCID: PMC9351641 DOI: 10.1523/jneurosci.2476-21.2022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/04/2022] [Accepted: 04/12/2022] [Indexed: 11/21/2022] Open
Abstract
Speech perception in noisy environments is enhanced by seeing facial movements of communication partners. However, the neural mechanisms by which audio and visual speech are combined are not fully understood. We explore MEG phase-locking to auditory and visual signals in MEG recordings from 14 human participants (6 females, 8 males) that reported words from single spoken sentences. We manipulated the acoustic clarity and visual speech signals such that critical speech information is present in auditory, visual, or both modalities. MEG coherence analysis revealed that both auditory and visual speech envelopes (auditory amplitude modulations and lip aperture changes) were phase-locked to 2-6 Hz brain responses in auditory and visual cortex, consistent with entrainment to syllable-rate components. Partial coherence analysis was used to separate neural responses to correlated audio-visual signals and showed non-zero phase-locking to auditory envelope in occipital cortex during audio-visual (AV) speech. Furthermore, phase-locking to auditory signals in visual cortex was enhanced for AV speech compared with audio-only speech that was matched for intelligibility. Conversely, auditory regions of the superior temporal gyrus did not show above-chance partial coherence with visual speech signals during AV conditions but did show partial coherence in visual-only conditions. Hence, visual speech enabled stronger phase-locking to auditory signals in visual areas, whereas phase-locking of visual speech in auditory regions only occurred during silent lip-reading. Differences in these cross-modal interactions between auditory and visual speech signals are interpreted in line with cross-modal predictive mechanisms during speech perception.SIGNIFICANCE STATEMENT Verbal communication in noisy environments is challenging, especially for hearing-impaired individuals. Seeing facial movements of communication partners improves speech perception when auditory signals are degraded or absent. The neural mechanisms supporting lip-reading or audio-visual benefit are not fully understood. Using MEG recordings and partial coherence analysis, we show that speech information is used differently in brain regions that respond to auditory and visual speech. While visual areas use visual speech to improve phase-locking to auditory speech signals, auditory areas do not show phase-locking to visual speech unless auditory speech is absent and visual speech is used to substitute for missing auditory signals. These findings highlight brain processes that combine visual and auditory signals to support speech understanding.
Collapse
Affiliation(s)
- Máté Aller
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Heidi Solberg Økland
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Lucy J MacGregor
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Helen Blank
- University Medical Center Hamburg-Eppendorf, Hamburg, 20246, Germany
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| |
Collapse
|
10
|
Sherafati A, Dwyer N, Bajracharya A, Hassanpour MS, Eggebrecht AT, Firszt JB, Culver JP, Peelle JE. Prefrontal cortex supports speech perception in listeners with cochlear implants. eLife 2022; 11:e75323. [PMID: 35666138 PMCID: PMC9225001 DOI: 10.7554/elife.75323] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/04/2022] [Indexed: 12/14/2022] Open
Abstract
Cochlear implants are neuroprosthetic devices that can restore hearing in people with severe to profound hearing loss by electrically stimulating the auditory nerve. Because of physical limitations on the precision of this stimulation, the acoustic information delivered by a cochlear implant does not convey the same level of acoustic detail as that conveyed by normal hearing. As a result, speech understanding in listeners with cochlear implants is typically poorer and more effortful than in listeners with normal hearing. The brain networks supporting speech understanding in listeners with cochlear implants are not well understood, partly due to difficulties obtaining functional neuroimaging data in this population. In the current study, we assessed the brain regions supporting spoken word understanding in adult listeners with right unilateral cochlear implants (n=20) and matched controls (n=18) using high-density diffuse optical tomography (HD-DOT), a quiet and non-invasive imaging modality with spatial resolution comparable to that of functional MRI. We found that while listening to spoken words in quiet, listeners with cochlear implants showed greater activity in the left prefrontal cortex than listeners with normal hearing, specifically in a region engaged in a separate spatial working memory task. These results suggest that listeners with cochlear implants require greater cognitive processing during speech understanding than listeners with normal hearing, supported by compensatory recruitment of the left prefrontal cortex.
Collapse
Affiliation(s)
- Arefeh Sherafati
- Department of Radiology, Washington University in St. LouisSt. LouisUnited States
| | - Noel Dwyer
- Department of Otolaryngology, Washington University in St. LouisSt. LouisUnited States
| | - Aahana Bajracharya
- Department of Otolaryngology, Washington University in St. LouisSt. LouisUnited States
| | | | - Adam T Eggebrecht
- Department of Radiology, Washington University in St. LouisSt. LouisUnited States
- Department of Electrical & Systems Engineering, Washington University in St. LouisSt. LouisUnited States
- Department of Biomedical Engineering, Washington University in St. LouisSt. LouisUnited States
- Division of Biology and Biomedical Sciences, Washington University in St. LouisSt. LouisUnited States
| | - Jill B Firszt
- Department of Otolaryngology, Washington University in St. LouisSt. LouisUnited States
| | - Joseph P Culver
- Department of Radiology, Washington University in St. LouisSt. LouisUnited States
- Department of Biomedical Engineering, Washington University in St. LouisSt. LouisUnited States
- Division of Biology and Biomedical Sciences, Washington University in St. LouisSt. LouisUnited States
- Department of Physics, Washington University in St. LouisSt. LouisUnited States
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. LouisSt. LouisUnited States
| |
Collapse
|
11
|
Tamati TN, Sevich VA, Clausing EM, Moberly AC. Lexical Effects on the Perceived Clarity of Noise-Vocoded Speech in Younger and Older Listeners. Front Psychol 2022; 13:837644. [PMID: 35432072 PMCID: PMC9010567 DOI: 10.3389/fpsyg.2022.837644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 02/16/2022] [Indexed: 11/13/2022] Open
Abstract
When listening to degraded speech, such as speech delivered by a cochlear implant (CI), listeners make use of top-down linguistic knowledge to facilitate speech recognition. Lexical knowledge supports speech recognition and enhances the perceived clarity of speech. Yet, the extent to which lexical knowledge can be used to effectively compensate for degraded input may depend on the degree of degradation and the listener's age. The current study investigated lexical effects in the compensation for speech that was degraded via noise-vocoding in younger and older listeners. In an online experiment, younger and older normal-hearing (NH) listeners rated the clarity of noise-vocoded sentences on a scale from 1 ("very unclear") to 7 ("completely clear"). Lexical information was provided by matching text primes and the lexical content of the target utterance. Half of the sentences were preceded by a matching text prime, while half were preceded by a non-matching prime. Each sentence also consisted of three key words of high or low lexical frequency and neighborhood density. Sentences were processed to simulate CI hearing, using an eight-channel noise vocoder with varying filter slopes. Results showed that lexical information impacted the perceived clarity of noise-vocoded speech. Noise-vocoded speech was perceived as clearer when preceded by a matching prime, and when sentences included key words with high lexical frequency and low neighborhood density. However, the strength of the lexical effects depended on the level of degradation. Matching text primes had a greater impact for speech with poorer spectral resolution, but lexical content had a smaller impact for speech with poorer spectral resolution. Finally, lexical information appeared to benefit both younger and older listeners. Findings demonstrate that lexical knowledge can be employed by younger and older listeners in cognitive compensation during the processing of noise-vocoded speech. However, lexical content may not be as reliable when the signal is highly degraded. Clinical implications are that for adult CI users, lexical knowledge might be used to compensate for the degraded speech signal, regardless of age, but some CI users may be hindered by a relatively poor signal.
Collapse
Affiliation(s)
- Terrin N. Tamati
- Department of Otolaryngology – Head and Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, OH, United States
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Victoria A. Sevich
- Department of Speech and Hearing Science, The Ohio State University, Columbus, OH, United States
| | - Emily M. Clausing
- Department of Otolaryngology – Head and Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, OH, United States
| | - Aaron C. Moberly
- Department of Otolaryngology – Head and Neck Surgery, The Ohio State University Wexner Medical Center, Columbus, OH, United States
| |
Collapse
|
12
|
Corcoran AW, Perera R, Koroma M, Kouider S, Hohwy J, Andrillon T. Expectations boost the reconstruction of auditory features from electrophysiological responses to noisy speech. Cereb Cortex 2022; 33:691-708. [PMID: 35253871 PMCID: PMC9890472 DOI: 10.1093/cercor/bhac094] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 02/11/2022] [Accepted: 02/12/2022] [Indexed: 02/04/2023] Open
Abstract
Online speech processing imposes significant computational demands on the listening brain, the underlying mechanisms of which remain poorly understood. Here, we exploit the perceptual "pop-out" phenomenon (i.e. the dramatic improvement of speech intelligibility after receiving information about speech content) to investigate the neurophysiological effects of prior expectations on degraded speech comprehension. We recorded electroencephalography (EEG) and pupillometry from 21 adults while they rated the clarity of noise-vocoded and sine-wave synthesized sentences. Pop-out was reliably elicited following visual presentation of the corresponding written sentence, but not following incongruent or neutral text. Pop-out was associated with improved reconstruction of the acoustic stimulus envelope from low-frequency EEG activity, implying that improvements in perceptual clarity were mediated via top-down signals that enhanced the quality of cortical speech representations. Spectral analysis further revealed that pop-out was accompanied by a reduction in theta-band power, consistent with predictive coding accounts of acoustic filling-in and incremental sentence processing. Moreover, delta-band power, alpha-band power, and pupil diameter were all increased following the provision of any written sentence information, irrespective of content. Together, these findings reveal distinctive profiles of neurophysiological activity that differentiate the content-specific processes associated with degraded speech comprehension from the context-specific processes invoked under adverse listening conditions.
Collapse
Affiliation(s)
- Andrew W Corcoran
- Corresponding author: Room E672, 20 Chancellors Walk, Clayton, VIC 3800, Australia.
| | - Ricardo Perera
- Cognition & Philosophy Laboratory, School of Philosophical, Historical, and International Studies, Monash University, Melbourne, VIC 3800 Australia
| | - Matthieu Koroma
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d’Études Cognitives, École Normale Supérieure-PSL Research University, Paris 75005, France
| | - Sid Kouider
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d’Études Cognitives, École Normale Supérieure-PSL Research University, Paris 75005, France
| | - Jakob Hohwy
- Cognition & Philosophy Laboratory, School of Philosophical, Historical, and International Studies, Monash University, Melbourne, VIC 3800 Australia,Monash Centre for Consciousness & Contemplative Studies, Monash University, Melbourne, VIC 3800 Australia
| | - Thomas Andrillon
- Monash Centre for Consciousness & Contemplative Studies, Monash University, Melbourne, VIC 3800 Australia,Paris Brain Institute, Sorbonne Université, Inserm-CNRS, Paris 75013, France
| |
Collapse
|
13
|
Trotter AS, Banks B, Adank P. The Relevance of the Availability of Visual Speech Cues During Adaptation to Noise-Vocoded Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2513-2528. [PMID: 34161748 DOI: 10.1044/2021_jslhr-20-00575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose This study first aimed to establish whether viewing specific parts of the speaker's face (eyes or mouth), compared to viewing the whole face, affected adaptation to distorted noise-vocoded sentences. Second, this study also aimed to replicate results on processing of distorted speech from lab-based experiments in an online setup. Method We monitored recognition accuracy online while participants were listening to noise-vocoded sentences. We first established if participants were able to perceive and adapt to audiovisual four-band noise-vocoded sentences when the entire moving face was visible (AV Full). Four further groups were then tested: a group in which participants viewed the moving lower part of the speaker's face (AV Mouth), a group in which participants only see the moving upper part of the face (AV Eyes), a group in which participants could not see the moving lower or upper face (AV Blocked), and a group in which participants saw an image of a still face (AV Still). Results Participants repeated around 40% of the key words correctly and adapted during the experiment, but only when the moving mouth was visible. In contrast, performance was at floor level, and no adaptation took place, in conditions when the moving mouth was occluded. Conclusions The results show the importance of being able to observe relevant visual speech information from the speaker's mouth region, but not the eyes/upper face region, when listening and adapting to distorted sentences online. Second, the results also demonstrated that it is feasible to run speech perception and adaptation studies online, but that not all findings reported for lab studies replicate. Supplemental Material https://doi.org/10.23641/asha.14810523.
Collapse
Affiliation(s)
- Antony S Trotter
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| | - Briony Banks
- Department of Psychology, Lancaster University, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| |
Collapse
|
14
|
Text Captioning Buffers Against the Effects of Background Noise and Hearing Loss on Memory for Speech. Ear Hear 2021; 43:115-127. [PMID: 34260436 DOI: 10.1097/aud.0000000000001079] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE Everyday speech understanding frequently occurs in perceptually demanding environments, for example, due to background noise and normal age-related hearing loss. The resulting degraded speech signals increase listening effort, which gives rise to negative downstream effects on subsequent memory and comprehension, even when speech is intelligible. In two experiments, we explored whether the presentation of realistic assistive text captioned speech offsets the negative effects of background noise and hearing impairment on multiple measures of speech memory. DESIGN In Experiment 1, young normal-hearing adults (N = 48) listened to sentences for immediate recall and delayed recognition memory. Speech was presented in quiet or in two levels of background noise. Sentences were either presented as speech only or as text captioned speech. Thus, the experiment followed a 2 (caption vs no caption) × 3 (no noise, +7 dB signal-to-noise ratio, +3 dB signal-to-noise ratio) within-subjects design. In Experiment 2, a group of older adults (age range: 61 to 80, N = 31), with varying levels of hearing acuity completed the same experimental task as in Experiment 1. For both experiments, immediate recall, recognition memory accuracy, and recognition memory confidence were analyzed via general(ized) linear mixed-effects models. In addition, we examined individual differences as a function of hearing acuity in Experiment 2. RESULTS In Experiment 1, we found that the presentation of realistic text-captioned speech in young normal-hearing listeners showed improved immediate recall and delayed recognition memory accuracy and confidence compared with speech alone. Moreover, text captions attenuated the negative effects of background noise on all speech memory outcomes. In Experiment 2, we replicated the same pattern of results in a sample of older adults with varying levels of hearing acuity. Moreover, we showed that the negative effects of hearing loss on speech memory in older adulthood were attenuated by the presentation of text captions. CONCLUSIONS Collectively, these findings strongly suggest that the simultaneous presentation of text can offset the negative effects of effortful listening on speech memory. Critically, captioning benefits extended from immediate word recall to long-term sentence recognition memory, a benefit that was observed not only for older adults with hearing loss but also young normal-hearing listeners. These findings suggest that the text captioning benefit to memory is robust and has potentially wide applications for supporting speech listening in acoustically challenging environments.
Collapse
|
15
|
Zhong L, Noud BP, Pruitt H, Marcrum SC, Picou EM. Effects of text supplementation on speech intelligibility for listeners with normal and impaired hearing: a systematic review with implications for telecommunication. Int J Audiol 2021; 61:1-11. [PMID: 34154488 DOI: 10.1080/14992027.2021.1937346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
OBJECTIVE Telecommunication can be difficult in the presence of noise or hearing loss. The purpose of this study was to systematically review evidence regarding the effects of text supplementation (e.g. captions, subtitles) of auditory or auditory-visual signals on speech intelligibility for listeners with normal or impaired hearing. DESIGN Three databases were searched. Articles were evaluated for inclusion based on the Population Intervention Comparison Outcome framework. The Effective Public Health Practice Project instrument was used to evaluate the quality of the identified articles. STUDY SAMPLE After duplicates were removed, the titles and abstracts of 2019 articles were screened. Forty-six full texts were reviewed; ten met inclusion criteria. RESULTS The quality of all ten articles was moderate or strong. The articles demonstrated that text added to auditory (or auditory-visual) signals improved speech intelligibility and that the benefits were largest when auditory signal integrity was low, accuracy of the text was high, and the auditory signal and text were synchronous. Age and hearing loss did not affect benefits from the addition of text. CONCLUSIONS Although only based on ten studies, these data support the use of text as a supplement during telecommunication, such as while watching television or during telehealth appointments.
Collapse
Affiliation(s)
- Ling Zhong
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Brianne P Noud
- Department of Audiology, Center for Hearing and Speech, St. Louis, MO, USA
| | - Harriet Pruitt
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Speech-Language Pathology, Advanced Therapy Solutions, Clarksville, TN, USA
| | - Steven C Marcrum
- Department of Otolaryngology, University Hospital Regensburg, Regensburg, Germany
| | - Erin M Picou
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
16
|
Jiang J, Benhamou E, Waters S, Johnson JCS, Volkmer A, Weil RS, Marshall CR, Warren JD, Hardy CJD. Processing of Degraded Speech in Brain Disorders. Brain Sci 2021; 11:394. [PMID: 33804653 PMCID: PMC8003678 DOI: 10.3390/brainsci11030394] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 03/15/2021] [Accepted: 03/18/2021] [Indexed: 11/30/2022] Open
Abstract
The speech we hear every day is typically "degraded" by competing sounds and the idiosyncratic vocal characteristics of individual speakers. While the comprehension of "degraded" speech is normally automatic, it depends on dynamic and adaptive processing across distributed neural networks. This presents the brain with an immense computational challenge, making degraded speech processing vulnerable to a range of brain disorders. Therefore, it is likely to be a sensitive marker of neural circuit dysfunction and an index of retained neural plasticity. Considering experimental methods for studying degraded speech and factors that affect its processing in healthy individuals, we review the evidence for altered degraded speech processing in major neurodegenerative diseases, traumatic brain injury and stroke. We develop a predictive coding framework for understanding deficits of degraded speech processing in these disorders, focussing on the "language-led dementias"-the primary progressive aphasias. We conclude by considering prospects for using degraded speech as a probe of language network pathophysiology, a diagnostic tool and a target for therapeutic intervention.
Collapse
Affiliation(s)
- Jessica Jiang
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| | - Elia Benhamou
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| | - Sheena Waters
- Preventive Neurology Unit, Wolfson Institute of Preventive Medicine, Queen Mary University of London, London EC1M 6BQ, UK;
| | - Jeremy C. S. Johnson
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| | - Anna Volkmer
- Division of Psychology and Language Sciences, University College London, London WC1H 0AP, UK;
| | - Rimona S. Weil
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| | - Charles R. Marshall
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
- Preventive Neurology Unit, Wolfson Institute of Preventive Medicine, Queen Mary University of London, London EC1M 6BQ, UK;
| | - Jason D. Warren
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| | - Chris J. D. Hardy
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK; (J.J.); (E.B.); (J.C.S.J.); (R.S.W.); (C.R.M.); (J.D.W.)
| |
Collapse
|
17
|
Asilador A, Llano DA. Top-Down Inference in the Auditory System: Potential Roles for Corticofugal Projections. Front Neural Circuits 2021; 14:615259. [PMID: 33551756 PMCID: PMC7862336 DOI: 10.3389/fncir.2020.615259] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 12/17/2020] [Indexed: 01/28/2023] Open
Abstract
It has become widely accepted that humans use contextual information to infer the meaning of ambiguous acoustic signals. In speech, for example, high-level semantic, syntactic, or lexical information shape our understanding of a phoneme buried in noise. Most current theories to explain this phenomenon rely on hierarchical predictive coding models involving a set of Bayesian priors emanating from high-level brain regions (e.g., prefrontal cortex) that are used to influence processing at lower-levels of the cortical sensory hierarchy (e.g., auditory cortex). As such, virtually all proposed models to explain top-down facilitation are focused on intracortical connections, and consequently, subcortical nuclei have scarcely been discussed in this context. However, subcortical auditory nuclei receive massive, heterogeneous, and cascading descending projections at every level of the sensory hierarchy, and activation of these systems has been shown to improve speech recognition. It is not yet clear whether or how top-down modulation to resolve ambiguous sounds calls upon these corticofugal projections. Here, we review the literature on top-down modulation in the auditory system, primarily focused on humans and cortical imaging/recording methods, and attempt to relate these findings to a growing animal literature, which has primarily been focused on corticofugal projections. We argue that corticofugal pathways contain the requisite circuitry to implement predictive coding mechanisms to facilitate perception of complex sounds and that top-down modulation at early (i.e., subcortical) stages of processing complement modulation at later (i.e., cortical) stages of processing. Finally, we suggest experimental approaches for future studies on this topic.
Collapse
Affiliation(s)
- Alexander Asilador
- Neuroscience Program, The University of Illinois at Urbana-Champaign, Champaign, IL, United States
- Beckman Institute for Advanced Science and Technology, Urbana, IL, United States
| | - Daniel A. Llano
- Neuroscience Program, The University of Illinois at Urbana-Champaign, Champaign, IL, United States
- Beckman Institute for Advanced Science and Technology, Urbana, IL, United States
- Molecular and Integrative Physiology, The University of Illinois at Urbana-Champaign, Champaign, IL, United States
| |
Collapse
|
18
|
Chavant M, Hervais-Adelman A, Macherey O. Perceptual Learning of Vocoded Speech With and Without Contralateral Hearing: Implications for Cochlear Implant Rehabilitation. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:196-205. [PMID: 33267729 DOI: 10.1044/2020_jslhr-20-00385] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose An increasing number of individuals with residual or even normal contralateral hearing are being considered for cochlear implantation. It remains unknown whether the presence of contralateral hearing is beneficial or detrimental to their perceptual learning of cochlear implant (CI)-processed speech. The aim of this experiment was to provide a first insight into this question using acoustic simulations of CI processing. Method Sixty normal-hearing listeners took part in an auditory perceptual learning experiment. Each subject was randomly assigned to one of three groups of 20 referred to as NORMAL, LOWPASS, and NOTHING. The experiment consisted of two test phases separated by a training phase. In the test phases, all subjects were tested on recognition of monosyllabic words passed through a six-channel "PSHC" vocoder presented to a single ear. In the training phase, which consisted of listening to a 25-min audio book, all subjects were also presented with the same vocoded speech in one ear but the signal they received in their other ear differed across groups. The NORMAL group was presented with the unprocessed speech signal, the LOWPASS group with a low-pass filtered version of the speech signal, and the NOTHING group with no sound at all. Results The improvement in speech scores following training was significantly smaller for the NORMAL than for the LOWPASS and NOTHING groups. Conclusions This study suggests that the presentation of normal speech in the contralateral ear reduces or slows down perceptual learning of vocoded speech but that an unintelligible low-pass filtered contralateral signal does not have this effect. Potential implications for the rehabilitation of CI patients with partial or full contralateral hearing are discussed.
Collapse
Affiliation(s)
- Martin Chavant
- Aix-Marseille University, Centre National de la Recherche Scientifique, Centrale Marseille, Laboratoire de Mécanique et d'Acoustique, France
| | | | - Olivier Macherey
- Aix-Marseille University, Centre National de la Recherche Scientifique, Centrale Marseille, Laboratoire de Mécanique et d'Acoustique, France
| |
Collapse
|
19
|
Gawęda Ł, Moritz S. The role of expectancies and emotional load in false auditory perceptions among patients with schizophrenia spectrum disorders. Eur Arch Psychiatry Clin Neurosci 2021; 271:713-722. [PMID: 31493150 PMCID: PMC8119254 DOI: 10.1007/s00406-019-01065-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 08/29/2019] [Indexed: 11/25/2022]
Abstract
Cognitive models suggest that top-down and emotional processes increase false perceptions in schizophrenia spectrum disorders (SSD). However, little is still known about the interaction of these processes in false auditory perceptions. The present study aimed at investigating the specific as well as joint impacts of expectancies and emotional load on false auditory perceptions in SSD. Thirty-three patients with SSD and 33 matched healthy controls were assessed with a false perception task. Participants were asked to detect a target stimulus (a word) in a white noise background (the word was present in 60% of the cases and absent in 40%). Conditions varied in terms of the level of expectancy (1. no cue prior to the stimulus, 2. semantic priming, 3. semantic priming accompanied by a video of a man's mouth spelling out the word). The words used were neutral or emotionally negative. Symptom severity was assessed with the Positive and Negative Syndrome Scale. Higher expectancy significantly increased the likelihood of false auditory perceptions only among the patients with SSD (the group x expectancy condition interaction was significant), which was unrelated to general cognitive performance. Emotional load had no impact on false auditory perceptions in either group. Patients made more false auditory perceptions with high confidence than controls did. False auditory perceptions were significantly correlated with the severity of positive symptoms and disorganization, but not with other dimensions. Perception in SSD seems to be susceptible to top-down processes, increasing the likelihood of high-confidence false auditory perceptions.
Collapse
Affiliation(s)
- Łukasz Gawęda
- Psychopathology and Early Intervention Lab II, Department of Psychiatry, The Medical University of Warsaw, Ul. Kondratowicza 8, 03-242, Warsaw, Poland.
- Department of Psychiatry and Psychotherapy, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| | - Steffen Moritz
- Department of Psychiatry and Psychotherapy, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
20
|
Sohoglu E, Davis MH. Rapid computations of spectrotemporal prediction error support perception of degraded speech. eLife 2020; 9:e58077. [PMID: 33147138 PMCID: PMC7641582 DOI: 10.7554/elife.58077] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 10/19/2020] [Indexed: 12/15/2022] Open
Abstract
Human speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We used magnetoencephalographic recordings of brain responses to degraded spoken words and experimentally manipulated signal quality and prior knowledge. We first demonstrate that spectrotemporal modulations in speech are more strongly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). Critically, we found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and is apparent in neural responses within 100 ms of speech input. Our findings contribute to the detailed specification of a computational model of speech perception based on predictive coding frameworks.
Collapse
Affiliation(s)
- Ediz Sohoglu
- School of Psychology, University of SussexBrightonUnited Kingdom
| | - Matthew H Davis
- MRC Cognition and Brain Sciences UnitCambridgeUnited Kingdom
| |
Collapse
|
21
|
Banellis L, Sokoliuk R, Wild CJ, Bowman H, Cruse D. Event-related potentials reflect prediction errors and pop-out during comprehension of degraded speech. Neurosci Conscious 2020; 2020:niaa022. [PMID: 33133640 PMCID: PMC7585676 DOI: 10.1093/nc/niaa022] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 07/08/2020] [Accepted: 08/06/2020] [Indexed: 11/20/2022] Open
Abstract
Comprehension of degraded speech requires higher-order expectations informed by prior knowledge. Accurate top-down expectations of incoming degraded speech cause a subjective semantic 'pop-out' or conscious breakthrough experience. Indeed, the same stimulus can be perceived as meaningless when no expectations are made in advance. We investigated the event-related potential (ERP) correlates of these top-down expectations, their error signals and the subjective pop-out experience in healthy participants. We manipulated expectations in a word-pair priming degraded (noise-vocoded) speech task and investigated the role of top-down expectation with a between-groups attention manipulation. Consistent with the role of expectations in comprehension, repetition priming significantly enhanced perceptual intelligibility of the noise-vocoded degraded targets for attentive participants. An early ERP was larger for mismatched (i.e. unexpected) targets than matched targets, indicative of an initial error signal not reliant on top-down expectations. Subsequently, a P3a-like ERP was larger to matched targets than mismatched targets only for attending participants-i.e. a pop-out effect-while a later ERP was larger for mismatched targets and did not significantly interact with attention. Rather than relying on complex post hoc interactions between prediction error and precision to explain this apredictive pattern, we consider our data to be consistent with prediction error minimization accounts for early stages of processing followed by Global Neuronal Workspace-like breakthrough and processing in service of task goals.
Collapse
Affiliation(s)
- Leah Banellis
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Edgbaston B15 2TT, UK
| | - Rodika Sokoliuk
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Edgbaston B15 2TT, UK
| | - Conor J Wild
- Brain and Mind Institute, University of Western Ontario, London, ON N6A 3K7, Canada
| | - Howard Bowman
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Edgbaston B15 2TT, UK
- School of Computing, University of Kent, Canterbury, Kent CT2 7NF, UK
| | - Damian Cruse
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Edgbaston B15 2TT, UK
| |
Collapse
|
22
|
Lenc T, Keller PE, Varlet M, Nozaradan S. Neural and Behavioral Evidence for Frequency-Selective Context Effects in Rhythm Processing in Humans. Cereb Cortex Commun 2020; 1:tgaa037. [PMID: 34296106 PMCID: PMC8152888 DOI: 10.1093/texcom/tgaa037] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 06/30/2020] [Accepted: 07/16/2020] [Indexed: 01/17/2023] Open
Abstract
When listening to music, people often perceive and move along with a periodic meter. However, the dynamics of mapping between meter perception and the acoustic cues to meter periodicities in the sensory input remain largely unknown. To capture these dynamics, we recorded the electroencephalography while nonmusician and musician participants listened to nonrepeating rhythmic sequences, where acoustic cues to meter frequencies either gradually decreased (from regular to degraded) or increased (from degraded to regular). The results revealed greater neural activity selectively elicited at meter frequencies when the sequence gradually changed from regular to degraded compared with the opposite. Importantly, this effect was unlikely to arise from overall gain, or low-level auditory processing, as revealed by physiological modeling. Moreover, the context effect was more pronounced in nonmusicians, who also demonstrated facilitated sensory-motor synchronization with the meter for sequences that started as regular. In contrast, musicians showed weaker effects of recent context in their neural responses and robust ability to move along with the meter irrespective of stimulus degradation. Together, our results demonstrate that brain activity elicited by rhythm does not only reflect passive tracking of stimulus features, but represents continuous integration of sensory input with recent context.
Collapse
Affiliation(s)
- Tomas Lenc
- MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Penrith, Sydney, NSW 2751, Australia
| | - Peter E Keller
- MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Penrith, Sydney, NSW 2751, Australia
| | - Manuel Varlet
- MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Penrith, Sydney, NSW 2751, Australia
- School of Psychology, Western Sydney University, Penrith, Sydney, NSW 2751, Australia
| | - Sylvie Nozaradan
- MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Penrith, Sydney, NSW 2751, Australia
- Institute of Neuroscience (IONS), Université Catholique de Louvain (UCL), Brussels 1200, Belgium
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal QC H3C 3J7, Canada
| |
Collapse
|
23
|
Press C, Kok P, Yon D. The Perceptual Prediction Paradox. Trends Cogn Sci 2020; 24:13-24. [DOI: 10.1016/j.tics.2019.11.003] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 11/01/2019] [Accepted: 11/01/2019] [Indexed: 10/25/2022]
|
24
|
Casaponsa A, Sohoglu E, Moore DR, Füllgrabe C, Molloy K, Amitay S. Does training with amplitude modulated tones affect tone-vocoded speech perception? PLoS One 2019; 14:e0226288. [PMID: 31881550 PMCID: PMC6934405 DOI: 10.1371/journal.pone.0226288] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 11/22/2019] [Indexed: 11/17/2022] Open
Abstract
Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored.
Collapse
Affiliation(s)
- Aina Casaponsa
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
- Department of Linguistics and English Language, Lancaster University, Lancaster, England, United Kingdom
| | - Ediz Sohoglu
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - David R. Moore
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Christian Füllgrabe
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Katharine Molloy
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| | - Sygal Amitay
- Medical Research Council Institute of Hearing Research, Nottingham, England, United Kingdom
| |
Collapse
|
25
|
Zoefel B, Allard I, Anil M, Davis MH. Perception of Rhythmic Speech Is Modulated by Focal Bilateral Transcranial Alternating Current Stimulation. J Cogn Neurosci 2019; 32:226-240. [PMID: 31659922 DOI: 10.1162/jocn_a_01490] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Several recent studies have used transcranial alternating current stimulation (tACS) to demonstrate a causal role of neural oscillatory activity in speech processing. In particular, it has been shown that the ability to understand speech in a multi-speaker scenario or background noise depends on the timing of speech presentation relative to simultaneously applied tACS. However, it is possible that tACS did not change actual speech perception but rather auditory stream segregation. In this study, we tested whether the phase relation between tACS and the rhythm of degraded words, presented in silence, modulates word report accuracy. We found strong evidence for a tACS-induced modulation of speech perception, but only if the stimulation was applied bilaterally using ring electrodes (not for unilateral left hemisphere stimulation with square electrodes). These results were only obtained when data were analyzed using a statistical approach that was identified as optimal in a previous simulation study. The effect was driven by a phasic disruption of word report scores. Our results suggest a causal role of neural entrainment for speech perception and emphasize the importance of optimizing stimulation protocols and statistical approaches for brain stimulation research.
Collapse
|
26
|
Kommajosyula SP, Cai R, Bartlett E, Caspary DM. Top-down or bottom up: decreased stimulus salience increases responses to predictable stimuli of auditory thalamic neurons. J Physiol 2019; 597:2767-2784. [PMID: 30924931 DOI: 10.1113/jp277450] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Accepted: 03/25/2019] [Indexed: 01/29/2023] Open
Abstract
KEY POINTS Temporal imprecision leads to deficits in the comprehension of signals in cluttered acoustic environments, and the elderly are shown to use cognitive resources to disambiguate these signals. To mimic ageing in young rats, we delivered sound signals that are temporally degraded, which led to temporally imprecise neural codes. Instead of adaptation to repeated stimuli, with degraded signals, there was a relative increase in firing rates, similar to that seen in aged rats. We interpret this increase with repetition as a repair mechanism for strengthening the internal representations of degraded signals by the higher-order structures. ABSTRACT To better understand speech in challenging environments, older adults increasingly use top-down cognitive and contextual resources. The medial geniculate body (MGB) integrates ascending inputs with descending predictions to dynamically gate auditory representations based on salience and context. A previous MGB single-unit study found an increased preference for predictable sinusoidal amplitude modulated (SAM) stimuli in aged rats relative to young rats. The results suggested that the age-degraded/jittered up-stream acoustic code may engender an increased preference for predictable/repeating acoustic signals, possibly reflecting increased use of top-down resources. In the present study, we recorded from units in young-adult MGB, comparing responses to standard SAM with those evoked by less salient SAM (degraded) stimuli. We hypothesized that degrading the SAM stimulus would simulate the degraded ascending acoustic code seen in the elderly, increasing the preference for predictable stimuli. Single units were recorded from clusters of advanceable tetrodes implanted above the MGB of young-adult awake rats. Less salient SAM significantly increased the preference for predictable stimuli, especially at higher modulation frequencies. Rather than adaptation, higher modulation frequencies elicited increased numbers of spikes with each successive trial/repeat of the less salient SAM. These findings are consistent with previous findings obtained in aged rats suggesting that less salient acoustic signals engage the additional use of top-down resources, as reflected by an increased preference for repeating stimuli that enhance the representation of complex environmental/communication sounds.
Collapse
Affiliation(s)
- Srinivasa P Kommajosyula
- Southern Illinois University School of Medicine, , Department of Pharmacology, Springfield, IL, USA
| | - Rui Cai
- Southern Illinois University School of Medicine, , Department of Pharmacology, Springfield, IL, USA
| | - Edward Bartlett
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Donald M Caspary
- Southern Illinois University School of Medicine, , Department of Pharmacology, Springfield, IL, USA
| |
Collapse
|
27
|
Abstract
The effects of aging and age-related hearing loss on the ability to learn degraded speech are not well understood. This study was designed to compare the perceptual learning of time-compressed speech and its generalization to natural-fast speech across young adults with normal hearing, older adults with normal hearing, and older adults with age-related hearing loss. Early learning (following brief exposure to time-compressed speech) and later learning (following further training) were compared across groups. Age and age-related hearing loss were both associated with declines in early learning. Although the two groups of older adults improved during the training session, when compared to untrained control groups (matched for age and hearing), learning was weaker in older than in young adults. Especially, the transfer of learning to untrained time-compressed sentences was reduced in both groups of older adults. Transfer of learning to natural-fast speech occurred regardless of age and hearing, but it was limited to sentences encountered during training. Findings are discussed within the framework of dynamic models of speech perception and learning. Based on this framework, we tentatively suggest that age-related declines in learning may stem from age differences in the use of high- and low-level speech cues. These age differences result in weaker early learning in older adults, which may further contribute to the difficulty to perceive speech in daily conversational settings in this population.
Collapse
Affiliation(s)
- Maayan Manheim
- 1 Department of Communication Sciences and Disorders, University of Haifa, Israel
| | - Limor Lavie
- 1 Department of Communication Sciences and Disorders, University of Haifa, Israel
| | - Karen Banai
- 1 Department of Communication Sciences and Disorders, University of Haifa, Israel
| |
Collapse
|
28
|
|
29
|
Maintaining information about speech input during accent adaptation. PLoS One 2018; 13:e0199358. [PMID: 30086140 PMCID: PMC6080756 DOI: 10.1371/journal.pone.0199358] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Accepted: 06/06/2018] [Indexed: 11/19/2022] Open
Abstract
Speech understanding can be thought of as inferring progressively more abstract representations from a rapidly unfolding signal. One common view of this process holds that lower-level information is discarded as soon as higher-level units have been inferred. However, there is evidence that subcategorical information about speech percepts is not immediately discarded, but is maintained past word boundaries and integrated with subsequent input. Previous evidence for such subcategorical information maintenance has come from paradigms that lack many of the demands typical to everyday language use. We ask whether information maintenance is also possible under more typical constraints, and in particular whether it can facilitate accent adaptation. In a web-based paradigm, participants listened to isolated foreign-accented words in one of three conditions: subtitles were displayed concurrently with the speech, after speech offset, or not displayed at all. The delays between speech offset and subtitle presentation were manipulated. In a subsequent test phase, participants then transcribed novel words in the same accent without the aid of subtitles. We find that subtitles facilitate accent adaptation, even when displayed with a 6 second delay. Listeners thus maintained subcategorical information for sufficiently long to allow it to benefit adaptation. We close by discussing what type of information listeners maintain-subcategorical phonetic information, or just uncertainty about speech categories.
Collapse
|
30
|
Abstract
We conducted three experiments to test the fluency-misattribution account of auditory hindsight bias. According to this account, prior exposure to a clearly presented auditory stimulus produces fluent (improved) processing of a distorted version of that stimulus, which results in participants mistakenly rating that item as easy to identify. In all experiments, participants in an exposure phase heard clearly spoken words zero, one, three, or six times. In the test phase, we examined auditory hindsight bias by manipulating whether participants heard a clear version of a target word just prior to hearing the distorted version of that word. Participants then estimated the ability of naïve peers to identify the distorted word. Auditory hindsight bias and the number of priming presentations during the exposure phase interacted underadditively in their prediction of participants' estimates: When no clear version of the target word appeared prior to the distorted version of that word in the test phase, participants identified target words more often the more frequently they heard the clear word in the exposure phase. Conversely, hearing a clear version of the target word at test produced similar estimates, regardless of the number of times participants heard clear versions of those words during the exposure phase. As per Roberts and Sternberg's (Attention and Performance XIV, pp. 611-653, 1993) additive factors logic, this finding suggests that both auditory hindsight bias and repetition priming contribute to a common process, which we propose involves a misattribution of processing fluency. We conclude that misattribution of fluency accounts for auditory hindsight bias.
Collapse
|
31
|
Abstract
OBJECTIVES It is well known from previous research that when listeners are told what they are about to hear before a degraded or partially masked auditory signal is presented, the speech signal "pops out" of the background and becomes considerably more intelligible. The goal of this research was to explore whether this priming effect is as strong in older adults as in younger adults. DESIGN Fifty-six adults-28 older and 28 younger-listened to "nonsense" sentences spoken by a female talker in the presence of a 2-talker speech masker (also female) or a fluctuating speech-like noise masker at 5 signal-to-noise ratios. Just before, or just after, the auditory signal was presented, a typed caption was displayed on a computer screen. The caption sentence was either identical to the auditory sentence or differed by one key word. The subjects' task was to decide whether the caption and auditory messages were the same or different. Discrimination performance was reported in d'. The strength of the pop-out perception was inferred from the improvement in performance that was expected from the caption-before order of presentation. A subset of 12 subjects from each group made confidence judgments as they gave their responses, and also completed several cognitive tests. RESULTS Data showed a clear order effect for both subject groups and both maskers, with better same-different discrimination performance for the caption-before condition than the caption-after condition. However, for the two-talker masker, the younger adults obtained a larger and more consistent benefit from the caption-before order than the older adults across signal-to-noise ratios. Especially at the poorer signal-to-noise ratios, older subjects showed little evidence that they experienced the pop-out effect that is presumed to make the discrimination task easier. On average, older subjects also appeared to approach the task differently, being more reluctant than younger subjects to report that the captions and auditory sentences were the same. Correlation analyses indicated a significant negative association between age and priming benefit in the two-talker masker and nonsignificant associations between priming benefit in this masker and either high-frequency hearing loss or performance on the cognitive tasks. CONCLUSIONS Previous studies have shown that older adults are at least as good, if not better, at exploiting context in speech recognition, as compared with younger adults. The current results are not in disagreement with those findings but suggest that, under some conditions, the automatic priming process that may contribute to benefits from context is not as strong in older as in younger adults.
Collapse
|
32
|
Neural Prediction Errors Distinguish Perception and Misperception of Speech. J Neurosci 2018; 38:6076-6089. [PMID: 29891730 DOI: 10.1523/jneurosci.3258-17.2018] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 03/08/2018] [Accepted: 03/28/2018] [Indexed: 11/21/2022] Open
Abstract
Humans use prior expectations to improve perception, especially of sensory signals that are degraded or ambiguous. However, if sensory input deviates from prior expectations, then correct perception depends on adjusting or rejecting prior expectations. Failure to adjust or reject the prior leads to perceptual illusions, especially if there is partial overlap (and thus partial mismatch) between expectations and input. With speech, "slips of the ear" occur when expectations lead to misperception. For instance, an entomologist might be more susceptible to hear "The ants are my friends" for "The answer, my friend" (in the Bob Dylan song Blowing in the Wind). Here, we contrast two mechanisms by which prior expectations may lead to misperception of degraded speech. First, clear representations of the common sounds in the prior and input (i.e., expected sounds) may lead to incorrect confirmation of the prior. Second, insufficient representations of sounds that deviate between prior and input (i.e., prediction errors) could lead to deception. We used crossmodal predictions from written words that partially match degraded speech to compare neural responses when male and female human listeners were deceived into accepting the prior or correctly reject it. Combined behavioral and multivariate representational similarity analysis of fMRI data show that veridical perception of degraded speech is signaled by representations of prediction error in the left superior temporal sulcus. Instead of using top-down processes to support perception of expected sensory input, our findings suggest that the strength of neural prediction error representations distinguishes correct perception and misperception.SIGNIFICANCE STATEMENT Misperceiving spoken words is an everyday experience, with outcomes that range from shared amusement to serious miscommunication. For hearing-impaired individuals, frequent misperception can lead to social withdrawal and isolation, with severe consequences for wellbeing. In this work, we specify the neural mechanisms by which prior expectations, which are so often helpful for perception, can lead to misperception of degraded sensory signals. Most descriptive theories of illusory perception explain misperception as arising from a clear sensory representation of features or sounds that are in common between prior expectations and sensory input. Our work instead provides support for a complementary proposal: that misperception occurs when there is an insufficient sensory representations of the deviation between expectations and sensory signals.
Collapse
|
33
|
Helfer KS, Freyman RL, Merchant GR. How repetition influences speech understanding by younger, middle-aged and older adults. Int J Audiol 2018; 57:695-702. [PMID: 29801416 DOI: 10.1080/14992027.2018.1475756] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
OBJECTIVE To examine benefit from immediate repetition of a masked speech message in younger, middle-aged and older adults. DESIGN Participants listened to sentences in conditions where only the target message was repeated, and when both the target message and its accompanying masker (noise or speech) were repeated. In a follow-up experiment, the effect of repetition was evaluated using a square-wave modulated noise masker to compare benefit when listeners were exposed to the same glimpses of the target message during first and second presentation versus when the glimpses differed. STUDY SAMPLE Younger, middle-aged and older adults (n = 16/group) for the main experiment; 15 younger adults for the follow-up experiment. RESULTS Repetition benefit was larger when the target but not the masker was repeated for all groups. This was especially true for older adults, suggesting that these individuals may be more negatively affected when a background message is repeated. Data obtained using noise maskers suggest that it is slightly more beneficial when listeners hear different (versus identical) portions of speech between initial presentation and repetition. CONCLUSIONS Although subtle age-related differences were found in some conditions, results confirm that repetition is an effective repair strategy for listeners spanning the adult age range.
Collapse
Affiliation(s)
- Karen S Helfer
- a Department of Communication Disorders , University of Massachusetts Amherst , Amherst , MA , USA
| | - Richard L Freyman
- a Department of Communication Disorders , University of Massachusetts Amherst , Amherst , MA , USA
| | - Gabrielle R Merchant
- a Department of Communication Disorders , University of Massachusetts Amherst , Amherst , MA , USA
| |
Collapse
|
34
|
Keetels M, Bonte M, Vroomen J. A Selective Deficit in Phonetic Recalibration by Text in Developmental Dyslexia. Front Psychol 2018; 9:710. [PMID: 29867675 PMCID: PMC5962785 DOI: 10.3389/fpsyg.2018.00710] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 04/23/2018] [Indexed: 11/30/2022] Open
Abstract
Upon hearing an ambiguous speech sound, listeners may adjust their perceptual interpretation of the speech input in accordance with contextual information, like accompanying text or lipread speech (i.e., phonetic recalibration; Bertelson et al., 2003). As developmental dyslexia (DD) has been associated with reduced integration of text and speech sounds, we investigated whether this deficit becomes manifest when text is used to induce this type of audiovisual learning. Adults with DD and normal readers were exposed to ambiguous consonants halfway between /aba/ and /ada/ together with text or lipread speech. After this audiovisual exposure phase, they categorized auditory-only ambiguous test sounds. Results showed that individuals with DD, unlike normal readers, did not use text to recalibrate their phoneme categories, whereas their recalibration by lipread speech was spared. Individuals with DD demonstrated similar deficits when ambiguous vowels (halfway between /wIt/ and /wet/) were recalibrated by text. These findings indicate that DD is related to a specific letter-speech sound association deficit that extends over phoneme classes (vowels and consonants), but – as lipreading was spared – does not extend to a more general audio–visual integration deficit. In particular, these results highlight diminished reading-related audiovisual learning in addition to the commonly reported phonological problems in developmental dyslexia.
Collapse
Affiliation(s)
- Mirjam Keetels
- Cognitive Neuropsychology Laboratory, Department of Cognitive Neuropsychology, Tilburg University, Tilburg, Netherlands
| | - Milene Bonte
- Maastricht Brain Imaging Center, Department Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands
| | - Jean Vroomen
- Cognitive Neuropsychology Laboratory, Department of Cognitive Neuropsychology, Tilburg University, Tilburg, Netherlands
| |
Collapse
|
35
|
Stekelenburg JJ, Keetels M, Vroomen J. Multisensory integration of speech sounds with letters vs. visual speech: only visual speech induces the mismatch negativity. Eur J Neurosci 2018. [PMID: 29537657 PMCID: PMC5969231 DOI: 10.1111/ejn.13908] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect. Here, we examined whether written text, like visual speech, can induce an illusory change in the perception of speech sounds on both the behavioural and neural levels. In a sound categorization task, we found that both text and visual speech changed the identity of speech sounds from an /aba/-/ada/ continuum, but the size of this audiovisual effect was considerably smaller for text than visual speech. To examine at which level in the information processing hierarchy these multisensory interactions occur, we recorded electroencephalography in an audiovisual mismatch negativity (MMN, a component of the event-related potential reflecting preattentive auditory change detection) paradigm in which deviant text or visual speech was used to induce an illusory change in a sequence of ambiguous sounds halfway between /aba/ and /ada/. We found that only deviant visual speech induced an MMN, but not deviant text, which induced a late P3-like positive potential. These results demonstrate that text has much weaker effects on sound processing than visual speech does, possibly because text has different biological roots than visual speech.
Collapse
Affiliation(s)
- Jeroen J Stekelenburg
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, PO box 90153, 5000 LE, Tilburg, the Netherlands
| | - Mirjam Keetels
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, PO box 90153, 5000 LE, Tilburg, the Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, PO box 90153, 5000 LE, Tilburg, the Netherlands
| |
Collapse
|
36
|
Cope TE, Sohoglu E, Sedley W, Patterson K, Jones PS, Wiggins J, Dawson C, Grube M, Carlyon RP, Griffiths TD, Davis MH, Rowe JB. Evidence for causal top-down frontal contributions to predictive processes in speech perception. Nat Commun 2017; 8:2154. [PMID: 29255275 PMCID: PMC5735133 DOI: 10.1038/s41467-017-01958-7] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 10/27/2017] [Indexed: 11/09/2022] Open
Abstract
Perception relies on the integration of sensory information and prior expectations. Here we show that selective neurodegeneration of human frontal speech regions results in delayed reconciliation of predictions in temporal cortex. These temporal regions were not atrophic, displayed normal evoked magnetic and electrical power, and preserved neural sensitivity to manipulations of sensory detail. Frontal neurodegeneration does not prevent the perceptual effects of contextual information; instead, prior expectations are applied inflexibly. The precision of predictions correlates with beta power, in line with theoretical models of the neural instantiation of predictive coding. Fronto-temporal interactions are enhanced while participants reconcile prior predictions with degraded sensory signals. Excessively precise predictions can explain several challenging phenomena in frontal aphasias, including agrammatism and subjective difficulties with speech perception. This work demonstrates that higher-level frontal mechanisms for cognitive and behavioural flexibility make a causal functional contribution to the hierarchical generative models underlying speech perception.
Collapse
Affiliation(s)
- Thomas E Cope
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK.
| | - E Sohoglu
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| | - W Sedley
- Institute of Neuroscience, Newcastle University, Newcastle, NE1 7RU, UK
| | - K Patterson
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| | - P S Jones
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK
| | - J Wiggins
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK
| | - C Dawson
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK
| | - M Grube
- Institute of Neuroscience, Newcastle University, Newcastle, NE1 7RU, UK
| | - R P Carlyon
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| | - T D Griffiths
- Institute of Neuroscience, Newcastle University, Newcastle, NE1 7RU, UK
| | - Matthew H Davis
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| | - James B Rowe
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, CB2 0SZ, UK
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| |
Collapse
|
37
|
Nagaraj NK, Magimairaj BM. Role of working memory and lexical knowledge in perceptual restoration of interrupted speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:3756. [PMID: 29289104 DOI: 10.1121/1.5018429] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The role of working memory (WM) capacity and lexical knowledge in perceptual restoration (PR) of missing speech was investigated using the interrupted speech perception paradigm. Speech identification ability, which indexed PR, was measured using low-context sentences periodically interrupted at 1.5 Hz. PR was measured for silent gated, low-frequency speech noise filled, and low-frequency fine-structure and envelope filled interrupted conditions. WM capacity was measured using verbal and visuospatial span tasks. Lexical knowledge was assessed using both receptive vocabulary and meaning from context tests. Results showed that PR was better for speech noise filled condition than other conditions tested. Both receptive vocabulary and verbal WM capacity explained unique variance in PR for the speech noise filled condition, but were unrelated to performance in the silent gated condition. It was only receptive vocabulary that uniquely predicted PR for fine-structure and envelope filled conditions. These findings suggest that the contribution of lexical knowledge and verbal WM during PR depends crucially on the information content that replaced the silent intervals. When perceptual continuity was partially restored by filler speech noise, both lexical knowledge and verbal WM capacity facilitated PR. Importantly, for fine-structure and envelope filled interrupted conditions, lexical knowledge was crucial for PR.
Collapse
Affiliation(s)
- Naveen K Nagaraj
- Cognitive Hearing Science Lab, University of Arkansas for Medical Sciences and University of Arkansas at Little Rock, Little Rock, Arkansas 72204, USA
| | - Beula M Magimairaj
- Cognition and Language Lab, Communication Sciences and Disorders, University of Central Arkansas, Conway, Arkansas 72035, USA
| |
Collapse
|
38
|
Wu C, Zheng Y, Li J, Wu H, She S, Liu S, Ning Y, Li L. Brain substrates underlying auditory speech priming in healthy listeners and listeners with schizophrenia. Psychol Med 2017; 47:837-852. [PMID: 27894376 DOI: 10.1017/s0033291716002816] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
BACKGROUND Under 'cocktail party' listening conditions, healthy listeners and listeners with schizophrenia can use temporally pre-presented auditory speech-priming (ASP) stimuli to improve target-speech recognition, even though listeners with schizophrenia are more vulnerable to informational speech masking. METHOD Using functional magnetic resonance imaging, this study searched for both brain substrates underlying the unmasking effect of ASP in 16 healthy controls and 22 patients with schizophrenia, and brain substrates underlying schizophrenia-related speech-recognition deficits under speech-masking conditions. RESULTS In both controls and patients, introducing the ASP condition (against the auditory non-speech-priming condition) not only activated the left superior temporal gyrus (STG) and left posterior middle temporal gyrus (pMTG), but also enhanced functional connectivity of the left STG/pMTG with the left caudate. It also enhanced functional connectivity of the left STG/pMTG with the left pars triangularis of the inferior frontal gyrus (TriIFG) in controls and that with the left Rolandic operculum in patients. The strength of functional connectivity between the left STG and left TriIFG was correlated with target-speech recognition under the speech-masking condition in both controls and patients, but reduced in patients. CONCLUSIONS The left STG/pMTG and their ASP-related functional connectivity with both the left caudate and some frontal regions (the left TriIFG in healthy listeners and the left Rolandic operculum in listeners with schizophrenia) are involved in the unmasking effect of ASP, possibly through facilitating the following processes: masker-signal inhibition, target-speech encoding, and speech production. The schizophrenia-related reduction of functional connectivity between the left STG and left TriIFG augments the vulnerability of speech recognition to speech masking.
Collapse
Affiliation(s)
- C Wu
- School of Psychological and Cognitive Sciences, and Beijing Key Laboratory of Behavior and Mental Health,Key Laboratory on Machine Perception (Ministry of Education),Peking University,Beijing,People's Republic of China
| | - Y Zheng
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - J Li
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - H Wu
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - S She
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - S Liu
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - Y Ning
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - L Li
- School of Psychological and Cognitive Sciences, and Beijing Key Laboratory of Behavior and Mental Health,Key Laboratory on Machine Perception (Ministry of Education),Peking University,Beijing,People's Republic of China
| |
Collapse
|
39
|
Blank H, Davis MH. Prediction Errors but Not Sharpened Signals Simulate Multivoxel fMRI Patterns during Speech Perception. PLoS Biol 2016; 14:e1002577. [PMID: 27846209 PMCID: PMC5112801 DOI: 10.1371/journal.pbio.1002577] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 10/19/2016] [Indexed: 11/19/2022] Open
Abstract
Successful perception depends on combining sensory input with prior knowledge. However, the underlying mechanism by which these two sources of information are combined is unknown. In speech perception, as in other domains, two functionally distinct coding schemes have been proposed for how expectations influence representation of sensory evidence. Traditional models suggest that expected features of the speech input are enhanced or sharpened via interactive activation (Sharpened Signals). Conversely, Predictive Coding suggests that expected features are suppressed so that unexpected features of the speech input (Prediction Errors) are processed further. The present work is aimed at distinguishing between these two accounts of how prior knowledge influences speech perception. By combining behavioural, univariate, and multivariate fMRI measures of how sensory detail and prior expectations influence speech perception with computational modelling, we provide evidence in favour of Prediction Error computations. Increased sensory detail and informative expectations have additive behavioural and univariate neural effects because they both improve the accuracy of word report and reduce the BOLD signal in lateral temporal lobe regions. However, sensory detail and informative expectations have interacting effects on speech representations shown by multivariate fMRI in the posterior superior temporal sulcus. When prior knowledge was absent, increased sensory detail enhanced the amount of speech information measured in superior temporal multivoxel patterns, but with informative expectations, increased sensory detail reduced the amount of measured information. Computational simulations of Sharpened Signals and Prediction Errors during speech perception could both explain these behavioural and univariate fMRI observations. However, the multivariate fMRI observations were uniquely simulated by a Prediction Error and not a Sharpened Signal model. The interaction between prior expectation and sensory detail provides evidence for a Predictive Coding account of speech perception. Our work establishes methods that can be used to distinguish representations of Prediction Error and Sharpened Signals in other perceptual domains.
Collapse
Affiliation(s)
- Helen Blank
- MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom
- * E-mail:
| | - Matthew H. Davis
- MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom
| |
Collapse
|
40
|
Abstract
Listeners adjust their phonetic categories to cope with variations in the speech signal (phonetic recalibration). Previous studies have shown that lipread speech (and word knowledge) can adjust the perception of ambiguous speech and can induce phonetic adjustments (Bertelson, Vroomen, & de Gelder in Psychological Science, 14(6), 592–597, 2003; Norris, McQueen, & Cutler in Cognitive Psychology, 47(2), 204–238, 2003). We examined whether orthographic information (text) also can induce phonetic recalibration. Experiment 1 showed that after exposure to ambiguous speech sounds halfway between /b/ and /d/ that were combined with text (b or d) participants were more likely to categorize auditory-only test sounds in accordance with the exposed letters. Experiment 2 replicated this effect with a very short exposure phase. These results show that listeners adjust their phonetic boundaries in accordance with disambiguating orthographic information and that these adjustments show a rapid build-up.
Collapse
|
41
|
Patro C, Mendel LL. Role of contextual cues on the perception of spectrally reduced interrupted speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:1336. [PMID: 27586760 DOI: 10.1121/1.4961450] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Understanding speech within an auditory scene is constantly challenged by interfering noise in suboptimal listening environments when noise hinders the continuity of the speech stream. In such instances, a typical auditory-cognitive system perceptually integrates available speech information and "fills in" missing information in the light of semantic context. However, individuals with cochlear implants (CIs) find it difficult and effortful to understand interrupted speech compared to their normal hearing counterparts. This inefficiency in perceptual integration of speech could be attributed to further degradations in the spectral-temporal domain imposed by CIs making it difficult to utilize the contextual evidence effectively. To address these issues, 20 normal hearing adults listened to speech that was spectrally reduced and spectrally reduced interrupted in a manner similar to CI processing. The Revised Speech Perception in Noise test, which includes contextually rich and contextually poor sentences, was used to evaluate the influence of semantic context on speech perception. Results indicated that listeners benefited more from semantic context when they listened to spectrally reduced speech alone. For the spectrally reduced interrupted speech, contextual information was not as helpful under significant spectral reductions, but became beneficial as the spectral resolution improved. These results suggest top-down processing facilitates speech perception up to a point, and it fails to facilitate speech understanding when the speech signals are significantly degraded.
Collapse
Affiliation(s)
- Chhayakanta Patro
- School of Communication Sciences and Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee, 38152, USA
| | - Lisa Lucks Mendel
- School of Communication Sciences and Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee, 38152, USA
| |
Collapse
|
42
|
Beck Lidén C, Krüger O, Schwarz L, Erb M, Kardatzki B, Scheffler K, Ethofer T. Neurobiology of knowledge and misperception of lyrics. Neuroimage 2016; 134:12-21. [PMID: 27085504 DOI: 10.1016/j.neuroimage.2016.03.080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Revised: 03/30/2016] [Accepted: 03/31/2016] [Indexed: 10/21/2022] Open
Abstract
We conducted two functional magnetic resonance imaging (fMRI) experiments to investigate the neural underpinnings of knowledge and misperception of lyrics. In fMRI experiment 1, a linear relationship between familiarity with lyrics and activation was found in left-hemispheric speech-related as well as bilateral striatal areas which is in line with previous research on generation of lyrics. In fMRI experiment 2, we employed so called Mondegreens and Soramimi to induce misperceptions of lyrics revealing a bilateral network including middle temporal and inferior frontal areas as well as anterior cingulate cortex (ACC) and mediodorsal thalamus. ACC activation also correlated with the extent to which misperceptions were judged as amusing corroborating previous neuroimaging results on the role of this area in mediating the pleasant experience of chills during music perception. Finally, we examined the areas engaged during misperception of lyrics using diffusion-weighted imaging (DWI) to determine their structural connectivity. These combined fMRI/DWI results could serve as a neurobiological model for future studies on other types of misunderstanding which are events with potentially strong impact on our social life.
Collapse
Affiliation(s)
- Claudia Beck Lidén
- Department of Biomedical Magnetic Resonance, University of Tübingen, Otfried-Müller-Str. 51, 72076 Tübingen, Germany
| | - Oliver Krüger
- Department of Biomedical Magnetic Resonance, University of Tübingen, Otfried-Müller-Str. 51, 72076 Tübingen, Germany
| | - Lena Schwarz
- Department of Biomedical Magnetic Resonance, University of Tübingen, Otfried-Müller-Str. 51, 72076 Tübingen, Germany; University Clinic for Psychiatry and Psychotherapy, University of Tübingen, Calwer Str. 14, 72076 Tübingen, Germany
| | - Michael Erb
- Department of Biomedical Magnetic Resonance, University of Tübingen, Otfried-Müller-Str. 51, 72076 Tübingen, Germany
| | - Bernd Kardatzki
- Department of Biomedical Magnetic Resonance, University of Tübingen, Otfried-Müller-Str. 51, 72076 Tübingen, Germany
| | - Klaus Scheffler
- Department of Biomedical Magnetic Resonance, University of Tübingen, Otfried-Müller-Str. 51, 72076 Tübingen, Germany; Max-Planck-Institute for Biological Cybernetics, Speemannstraße 38-40, 72076 Tübingen, Germany
| | - Thomas Ethofer
- Department of Biomedical Magnetic Resonance, University of Tübingen, Otfried-Müller-Str. 51, 72076 Tübingen, Germany; University Clinic for Psychiatry and Psychotherapy, University of Tübingen, Calwer Str. 14, 72076 Tübingen, Germany; Max-Planck-Institute for Biological Cybernetics, Speemannstraße 38-40, 72076 Tübingen, Germany
| |
Collapse
|
43
|
Sohoglu E, Davis MH. Perceptual learning of degraded speech by minimizing prediction error. Proc Natl Acad Sci U S A 2016; 113:E1747-56. [PMID: 26957596 PMCID: PMC4812728 DOI: 10.1073/pnas.1523266113] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Human perception is shaped by past experience on multiple timescales. Sudden and dramatic changes in perception occur when prior knowledge or expectations match stimulus content. These immediate effects contrast with the longer-term, more gradual improvements that are characteristic of perceptual learning. Despite extensive investigation of these two experience-dependent phenomena, there is considerable debate about whether they result from common or dissociable neural mechanisms. Here we test single- and dual-mechanism accounts of experience-dependent changes in perception using concurrent magnetoencephalographic and EEG recordings of neural responses evoked by degraded speech. When speech clarity was enhanced by prior knowledge obtained from matching text, we observed reduced neural activity in a peri-auditory region of the superior temporal gyrus (STG). Critically, longer-term improvements in the accuracy of speech recognition following perceptual learning resulted in reduced activity in a nearly identical STG region. Moreover, short-term neural changes caused by prior knowledge and longer-term neural changes arising from perceptual learning were correlated across subjects with the magnitude of learning-induced changes in recognition accuracy. These experience-dependent effects on neural processing could be dissociated from the neural effect of hearing physically clearer speech, which similarly enhanced perception but increased rather than decreased STG responses. Hence, the observed neural effects of prior knowledge and perceptual learning cannot be attributed to epiphenomenal changes in listening effort that accompany enhanced perception. Instead, our results support a predictive coding account of speech perception; computational simulations show how a single mechanism, minimization of prediction error, can drive immediate perceptual effects of prior knowledge and longer-term perceptual learning of degraded speech.
Collapse
Affiliation(s)
- Ediz Sohoglu
- Medical Research Council Cognition and Brain Sciences Unit, Cambridge CB2 7EF, United Kingdom
| | - Matthew H Davis
- Medical Research Council Cognition and Brain Sciences Unit, Cambridge CB2 7EF, United Kingdom
| |
Collapse
|
44
|
Norris D, McQueen JM, Cutler A. Prediction, Bayesian inference and feedback in speech recognition. LANGUAGE, COGNITION AND NEUROSCIENCE 2016; 31:4-18. [PMID: 26740960 PMCID: PMC4685608 DOI: 10.1080/23273798.2015.1081703] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Accepted: 08/05/2015] [Indexed: 05/19/2023]
Abstract
Speech perception involves prediction, but how is that prediction implemented? In cognitive models prediction has often been taken to imply that there is feedback of activation from lexical to pre-lexical processes as implemented in interactive-activation models (IAMs). We show that simple activation feedback does not actually improve speech recognition. However, other forms of feedback can be beneficial. In particular, feedback can enable the listener to adapt to changing input, and can potentially help the listener to recognise unusual input, or recognise speech in the presence of competing sounds. The common feature of these helpful forms of feedback is that they are all ways of optimising the performance of speech recognition using Bayesian inference. That is, listeners make predictions about speech because speech recognition is optimal in the sense captured in Bayesian models.
Collapse
Affiliation(s)
- Dennis Norris
- MRC Cognition and Brain Sciences Unit, Cambridge, UK
| | - James M. McQueen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Anne Cutler
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- MARCS Institute, University of Western Sydney, Penrith South, NSW2751, Australia
| |
Collapse
|
45
|
Moulin A, Richard C. Lexical Influences on Spoken Spondaic Word Recognition in Hearing-Impaired Patients. Front Neurosci 2015; 9:476. [PMID: 26778945 PMCID: PMC4688363 DOI: 10.3389/fnins.2015.00476] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Accepted: 11/26/2015] [Indexed: 11/13/2022] Open
Abstract
Top-down contextual influences play a major part in speech understanding, especially in hearing-impaired patients with deteriorated auditory input. Those influences are most obvious in difficult listening situations, such as listening to sentences in noise but can also be observed at the word level under more favorable conditions, as in one of the most commonly used tasks in audiology, i.e., repeating isolated words in silence. This study aimed to explore the role of top-down contextual influences and their dependence on lexical factors and patient-specific factors using standard clinical linguistic material. Spondaic word perception was tested in 160 hearing-impaired patients aged 23-88 years with a four-frequency average pure-tone threshold ranging from 21 to 88 dB HL. Sixty spondaic words were randomly presented at a level adjusted to correspond to a speech perception score ranging between 40 and 70% of the performance intensity function obtained using monosyllabic words. Phoneme and whole-word recognition scores were used to calculate two context-influence indices (the j factor and the ratio of word scores to phonemic scores) and were correlated with linguistic factors, such as the phonological neighborhood density and several indices of word occurrence frequencies. Contextual influence was greater for spondaic words than in similar studies using monosyllabic words, with an overall j factor of 2.07 (SD = 0.5). For both indices, context use decreased with increasing hearing loss once the average hearing loss exceeded 55 dB HL. In right-handed patients, significantly greater context influence was observed for words presented in the right ears than for words presented in the left, especially in patients with many years of education. The correlations between raw word scores (and context influence indices) and word occurrence frequencies showed a significant age-dependent effect, with a stronger correlation between perception scores and word occurrence frequencies when the occurrence frequencies were based on the years corresponding to the patients' youth, showing a "historic" word frequency effect. This effect was still observed for patients with few years of formal education, but recent occurrence frequencies based on current word exposure had a stronger influence for those patients, especially for younger ones.
Collapse
Affiliation(s)
- Annie Moulin
- INSERM, U1028, Lyon Neuroscience Research Center, Brain Dynamics and Cognition TeamLyon, France
- CNRS, UMR5292, Lyon Neuroscience Research Center, Brain Dynamics and Cognition TeamLyon, France
- University of LyonLyon, France
| | - Céline Richard
- Otorhinolaryngology Department, Vaudois University Hospital Center and University of LausanneLausanne, Switzerland
- The Laboratory for Investigative Neurophysiology, Department of Radiology and Department of Clinical Neurosciences, Vaudois University Hospital Center and University of LausanneLausanne, Switzerland
| |
Collapse
|
46
|
Freyman RL, Morse-Fortier C, Griffin AM. Temporal effects in priming of masked and degraded speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:1418-1427. [PMID: 26428780 PMCID: PMC4567576 DOI: 10.1121/1.4927490] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Revised: 07/13/2015] [Accepted: 07/14/2015] [Indexed: 06/05/2023]
Abstract
When listeners know the content of the message they are about to hear, the clarity of distorted or partially masked speech increases dramatically. The current experiments investigated this priming phenomenon quantitatively using a same-different task where a typed caption and auditory message either matched exactly or differed by one key word. Four conditions were tested with groups of normal-hearing listeners: (a) natural speech presented in two-talker babble in a non-spatial configuration, (b) same as (a) but with the masker time reversed, (c) same as (a) but with target-masker spatial separation, and (d) vocoded sentences presented in speech-spectrum noise. The primary manipulation was the timing of the caption relative to the auditory message, which varied in 20 steps with a resolution of 200 ms. Across all four conditions, optimal performance was achieved when the initiation of the text preceded the acoustic speech signal by at least 400 ms, driven mostly by a low number of "different" responses to Same stimuli. Performance was slightly poorer with simultaneous delivery and much poorer when the auditory signal preceded the caption. Because priming may be used to facilitate perceptual learning, identifying optimal temporal conditions for priming could help determine the best conditions for auditory training.
Collapse
Affiliation(s)
- Richard L Freyman
- Department of Communication Disorders, University of Massachusetts, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA
| | - Charlotte Morse-Fortier
- Department of Communication Disorders, University of Massachusetts, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA
| | - Amanda M Griffin
- Department of Communication Disorders, University of Massachusetts, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA
| |
Collapse
|
47
|
Van Engen KJ, Peelle JE. Listening effort and accented speech. Front Hum Neurosci 2014; 8:577. [PMID: 25140140 PMCID: PMC4122174 DOI: 10.3389/fnhum.2014.00577] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 07/14/2014] [Indexed: 11/25/2022] Open
Affiliation(s)
| | - Jonathan E. Peelle
- Department of Otolaryngology, Washington University in St. LouisSt. Louis, MO, USA
| |
Collapse
|