1
|
Li Z, Zhang D. How does the human brain process noisy speech in real life? Insights from the second-person neuroscience perspective. Cogn Neurodyn 2024; 18:371-382. [PMID: 38699619 PMCID: PMC11061069 DOI: 10.1007/s11571-022-09924-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/20/2022] [Accepted: 12/19/2022] [Indexed: 01/07/2023] Open
Abstract
Comprehending speech with the existence of background noise is of great importance for human life. In the past decades, a large number of psychological, cognitive and neuroscientific research has explored the neurocognitive mechanisms of speech-in-noise comprehension. However, as limited by the low ecological validity of the speech stimuli and the experimental paradigm, as well as the inadequate attention on the high-order linguistic and extralinguistic processes, there remains much unknown about how the brain processes noisy speech in real-life scenarios. A recently emerging approach, i.e., the second-person neuroscience approach, provides a novel conceptual framework. It measures both of the speaker's and the listener's neural activities, and estimates the speaker-listener neural coupling with regarding of the speaker's production-related neural activity as a standardized reference. The second-person approach not only promotes the use of naturalistic speech but also allows for free communication between speaker and listener as in a close-to-life context. In this review, we first briefly review the previous discoveries about how the brain processes speech in noise; then, we introduce the principles and advantages of the second-person neuroscience approach and discuss its implications to unravel the linguistic and extralinguistic processes during speech-in-noise comprehension; finally, we conclude by proposing some critical issues and calls for more research interests in the second-person approach, which would further extend the present knowledge about how people comprehend speech in noise.
Collapse
Affiliation(s)
- Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, 100084 China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, 100084 China
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Room 334, Mingzhai Building, Beijing, 100084 China
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing, 100084 China
| |
Collapse
|
2
|
Hodson AJ, Shinn-Cunningham BG, Holt LL. Statistical learning across passive listening adjusts perceptual weights of speech input dimensions. Cognition 2023; 238:105473. [PMID: 37210878 DOI: 10.1016/j.cognition.2023.105473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/20/2023] [Accepted: 04/24/2023] [Indexed: 05/23/2023]
Abstract
Statistical learning across passive exposure has been theoretically situated with unsupervised learning. However, when input statistics accumulate over established representations - like speech syllables, for example - there is the possibility that prediction derived from activation of rich, existing representations may support error-driven learning. Here, across five experiments, we present evidence for error-driven learning across passive speech listening. Young adults passively listened to a string of eight beer - pier speech tokens with distributional regularities following either a canonical American-English acoustic dimension correlation or a correlation reversed to create an accent. A sequence-final test stimulus assayed the perceptual weight - the effectiveness - of the secondary dimension in signaling category membership as a function of preceding sequence regularities. Perceptual weight flexibly adjusted according to the passively experienced regularities even when the preceding regularities shifted on a trial-by-trial basis. The findings align with a theoretical view that activation of established internal representations can support learning across statistical regularities via error-driven learning. At the broadest level, this suggests that not all statistical learning need be unsupervised. Moreover, these findings help to account for how cognitive systems may accommodate competing demands for flexibility and stability: instead of overwriting existing representations when short-term input distributions depart from the norms, the mapping from input to category representations may be dynamically - and rapidly - adjusted via error-driven learning from predictions derived from internal representations.
Collapse
Affiliation(s)
- Alana J Hodson
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA.
| | | | - Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA; Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
3
|
Xie X, Jaeger TF, Kurumada C. What we do (not) know about the mechanisms underlying adaptive speech perception: A computational framework and review. Cortex 2023; 166:377-424. [PMID: 37506665 DOI: 10.1016/j.cortex.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 12/23/2022] [Accepted: 05/05/2023] [Indexed: 07/30/2023]
Abstract
Speech from unfamiliar talkers can be difficult to comprehend initially. These difficulties tend to dissipate with exposure, sometimes within minutes or less. Adaptivity in response to unfamiliar input is now considered a fundamental property of speech perception, and research over the past two decades has made substantial progress in identifying its characteristics. The mechanisms underlying adaptive speech perception, however, remain unknown. Past work has attributed facilitatory effects of exposure to any one of three qualitatively different hypothesized mechanisms: (1) low-level, pre-linguistic, signal normalization, (2) changes in/selection of linguistic representations, or (3) changes in post-perceptual decision-making. Direct comparisons of these hypotheses, or combinations thereof, have been lacking. We describe a general computational framework for adaptive speech perception (ASP) that-for the first time-implements all three mechanisms. We demonstrate how the framework can be used to derive predictions for experiments on perception from the acoustic properties of the stimuli. Using this approach, we find that-at the level of data analysis presently employed by most studies in the field-the signature results of influential experimental paradigms do not distinguish between the three mechanisms. This highlights the need for a change in research practices, so that future experiments provide more informative results. We recommend specific changes to experimental paradigms and data analysis. All data and code for this study are shared via OSF, including the R markdown document that this article is generated from, and an R library that implements the models we present.
Collapse
Affiliation(s)
- Xin Xie
- Language Science, University of California, Irvine, USA.
| | - T Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA; Computer Science, University of Rochester, Rochester, NY, USA
| | - Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA
| |
Collapse
|
4
|
Jasmin K, Tierney A, Obasih C, Holt L. Short-term perceptual reweighting in suprasegmental categorization. Psychon Bull Rev 2023; 30:373-382. [PMID: 35915382 PMCID: PMC9971089 DOI: 10.3758/s13423-022-02146-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2022] [Indexed: 11/08/2022]
Abstract
Segmental speech units such as phonemes are described as multidimensional categories whose perception involves contributions from multiple acoustic input dimensions, and the relative perceptual weights of these dimensions respond dynamically to context. For example, when speech is altered to create an "accent" in which two acoustic dimensions are correlated in a manner opposite that of long-term experience, the dimension that carries less perceptual weight is down-weighted to contribute less in category decisions. It remains unclear, however, whether this short-term reweighting extends to perception of suprasegmental features that span multiple phonemes, syllables, or words, in part because it has remained debatable whether suprasegmental features are perceived categorically. Here, we investigated the relative contribution of two acoustic dimensions to word emphasis. Participants categorized instances of a two-word phrase pronounced with typical covariation of fundamental frequency (F0) and duration, and in the context of an artificial "accent" in which F0 and duration (established in prior research on English speech as "primary" and "secondary" dimensions, respectively) covaried atypically. When categorizing "accented" speech, listeners rapidly down-weighted the secondary dimension (duration). This result indicates that listeners continually track short-term regularities across speech input and dynamically adjust the weight of acoustic evidence for suprasegmental decisions. Thus, dimension-based statistical learning appears to be a widespread phenomenon in speech perception extending to both segmental and suprasegmental categorization.
Collapse
Affiliation(s)
- Kyle Jasmin
- Department of Psychology, Wolfson Building, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK.
| | | | | | - Lori Holt
- Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
5
|
Pourhashemi F, Baart M, van Laarhoven T, Vroomen J. Want to quickly adapt to distorted speech and become a better listener? Read lips, not text. PLoS One 2022; 17:e0278986. [PMID: 36580461 PMCID: PMC9799298 DOI: 10.1371/journal.pone.0278986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 11/28/2022] [Indexed: 12/30/2022] Open
Abstract
When listening to distorted speech, does one become a better listener by looking at the face of the speaker or by reading subtitles that are presented along with the speech signal? We examined this question in two experiments in which we presented participants with spectrally distorted speech (4-channel noise-vocoded speech). During short training sessions, listeners received auditorily distorted words or pseudowords that were partially disambiguated by concurrently presented lipread information or text. After each training session, listeners were tested with new degraded auditory words. Learning effects (based on proportions of correctly identified words) were stronger if listeners had trained with words rather than with pseudowords (a lexical boost), and adding lipread information during training was more effective than adding text (a lipread boost). Moreover, the advantage of lipread speech over text training was also found when participants were tested more than a month later. The current results thus suggest that lipread speech may have surprisingly long-lasting effects on adaptation to distorted speech.
Collapse
Affiliation(s)
- Faezeh Pourhashemi
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Martijn Baart
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
- BCBL, Basque Center on Cognition, Brain, and Language, Donostia, Spain
- * E-mail:
| | - Thijs van Laarhoven
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
6
|
MacGregor LJ, Gilbert RA, Balewski Z, Mitchell DJ, Erzinçlioğlu SW, Rodd JM, Duncan J, Fedorenko E, Davis MH. Causal Contributions of the Domain-General (Multiple Demand) and the Language-Selective Brain Networks to Perceptual and Semantic Challenges in Speech Comprehension. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2022; 3:665-698. [PMID: 36742011 PMCID: PMC9893226 DOI: 10.1162/nol_a_00081] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 09/07/2022] [Indexed: 06/18/2023]
Abstract
Listening to spoken language engages domain-general multiple demand (MD; frontoparietal) regions of the human brain, in addition to domain-selective (frontotemporal) language regions, particularly when comprehension is challenging. However, there is limited evidence that the MD network makes a functional contribution to core aspects of understanding language. In a behavioural study of volunteers (n = 19) with chronic brain lesions, but without aphasia, we assessed the causal role of these networks in perceiving, comprehending, and adapting to spoken sentences made more challenging by acoustic-degradation or lexico-semantic ambiguity. We measured perception of and adaptation to acoustically degraded (noise-vocoded) sentences with a word report task before and after training. Participants with greater damage to MD but not language regions required more vocoder channels to achieve 50% word report, indicating impaired perception. Perception improved following training, reflecting adaptation to acoustic degradation, but adaptation was unrelated to lesion location or extent. Comprehension of spoken sentences with semantically ambiguous words was measured with a sentence coherence judgement task. Accuracy was high and unaffected by lesion location or extent. Adaptation to semantic ambiguity was measured in a subsequent word association task, which showed that availability of lower-frequency meanings of ambiguous words increased following their comprehension (word-meaning priming). Word-meaning priming was reduced for participants with greater damage to language but not MD regions. Language and MD networks make dissociable contributions to challenging speech comprehension: Using recent experience to update word meaning preferences depends on language-selective regions, whereas the domain-general MD network plays a causal role in reporting words from degraded speech.
Collapse
Affiliation(s)
- Lucy J. MacGregor
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Rebecca A. Gilbert
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Zuzanna Balewski
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA
| | - Daniel J. Mitchell
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | | | - Jennifer M. Rodd
- Psychology and Language Sciences, University College London, London, UK
| | - John Duncan
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Evelina Fedorenko
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA
| | - Matthew H. Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
7
|
Wu YC, Holt LL. Phonetic category activation predicts the direction and magnitude of perceptual adaptation to accented speech. J Exp Psychol Hum Percept Perform 2022; 48:913-925. [PMID: 35849375 PMCID: PMC10236200 DOI: 10.1037/xhp0001037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Unfamiliar accents can systematically shift speech acoustics away from community norms and reduce comprehension. Yet, limited exposure improves comprehension. This perceptual adaptation indicates that the mapping from acoustics to speech representations is dynamic, rather than fixed. But, what drives adjustments is debated. Supervised learning accounts posit that activation of an internal speech representation via disambiguating information generates predictions about patterns of speech input typically associated with the representation. When actual input mismatches predictions, the mapping is adjusted. We tested two hypotheses of this account across consonants and vowels as listeners categorized speech conveying an English-like acoustic regularity or an artificial accent. Across conditions, signal manipulations impacted which of two acoustic dimensions best conveyed category identity, and predicted which dimension would exhibit the effects of perceptual adaptation. Moreover, the strength of phonetic category activation, as estimated by categorization responses reliant on the dominant acoustic dimension, predicted the magnitude of adaptation observed across listeners. The results align with predictions of supervised learning accounts, suggesting that perceptual adaptation arises from speech category activation, corresponding predictions about the patterns of acoustic input that align with the category, and adjustments in subsequent speech perception when input mismatches these expectations. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
|
8
|
Francis AL. Adding noise is a confounded nuisance. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1375. [PMID: 36182286 DOI: 10.1121/10.0013874] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 08/15/2022] [Indexed: 06/16/2023]
Abstract
A wide variety of research and clinical assessments involve presenting speech stimuli in the presence of some kind of noise. Here, I selectively review two theoretical perspectives and discuss ways in which these perspectives may help researchers understand the consequences for listeners of adding noise to a speech signal. I argue that adding noise changes more about the listening task than merely making the signal more difficult to perceive. To fully understand the effects of an added noise on speech perception, we must consider not just how much the noise affects task difficulty, but also how it affects all of the systems involved in understanding speech: increasing message uncertainty, modifying attentional demand, altering affective response, and changing motivation to perform the task.
Collapse
Affiliation(s)
- Alexander L Francis
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, Indiana 47907, USA
| |
Collapse
|
9
|
Ivanov AZ, King AJ, Willmore BDB, Walker KMM, Harper NS. Cortical adaptation to sound reverberation. eLife 2022; 11:75090. [PMID: 35617119 PMCID: PMC9213001 DOI: 10.7554/elife.75090] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 05/25/2022] [Indexed: 11/13/2022] Open
Abstract
In almost every natural environment, sounds are reflected by nearby objects, producing many delayed and distorted copies of the original sound, known as reverberation. Our brains usually cope well with reverberation, allowing us to recognize sound sources regardless of their environments. In contrast, reverberation can cause severe difficulties for speech recognition algorithms and hearing-impaired people. The present study examines how the auditory system copes with reverberation. We trained a linear model to recover a rich set of natural, anechoic sounds from their simulated reverberant counterparts. The model neurons achieved this by extending the inhibitory component of their receptive filters for more reverberant spaces, and did so in a frequency-dependent manner. These predicted effects were observed in the responses of auditory cortical neurons of ferrets in the same simulated reverberant environments. Together, these results suggest that auditory cortical neurons adapt to reverberation by adjusting their filtering properties in a manner consistent with dereverberation.
Collapse
Affiliation(s)
- Aleksandar Z Ivanov
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Andrew J King
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Ben D B Willmore
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Kerry M M Walker
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Nicol S Harper
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
10
|
Neal K, McMahon CM, Hughes SE, Boisvert I. Listening-Based Communication Ability in Adults With Hearing Loss: A Scoping Review of Existing Measures. Front Psychol 2022; 13:786347. [PMID: 35360643 PMCID: PMC8960922 DOI: 10.3389/fpsyg.2022.786347] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 01/31/2022] [Indexed: 11/13/2022] Open
Abstract
Introduction Hearing loss in adults has a pervasive impact on health and well-being. Its effects on everyday listening and communication can directly influence participation across multiple spheres of life. These impacts, however, remain poorly assessed within clinical settings. Whilst various tests and questionnaires that measure listening and communication abilities are available, there is a lack of consensus about which measures assess the factors that are most relevant to optimising auditory rehabilitation. This study aimed to map current measures used in published studies to evaluate listening skills needed for oral communication in adults with hearing loss. Methods A scoping review was conducted using systematic searches in Medline, EMBASE, Web of Science and Google Scholar to retrieve peer-reviewed articles that used one or more linguistic-based measure necessary to oral communication in adults with hearing loss. The range of measures identified and their frequency where charted in relation to auditory hierarchies, linguistic domains, health status domains, and associated neuropsychological and cognitive domains. Results 9121 articles were identified and 2579 articles that reported on 6714 discrete measures were included for further analysis. The predominant linguistic-based measure reported was word or sentence identification in quiet (65.9%). In contrast, discourse-based measures were used in 2.7% of the articles included. Of the included studies, 36.6% used a self-reported instrument purporting to measures of listening for communication. Consistent with previous studies, a large number of self-reported measures were identified (n = 139), but 60.4% of these measures were used in only one study and 80.7% were cited five times or fewer. Discussion Current measures used in published studies to assess listening abilities relevant to oral communication target a narrow set of domains. Concepts of communicative interaction have limited representation in current measurement. The lack of measurement consensus and heterogeneity amongst the assessments limit comparisons across studies. Furthermore, extracted measures rarely consider the broader linguistic, cognitive and interactive elements of communication. Consequently, existing measures may have limited clinical application if assessing the listening-related skills required for communication in daily life, as experienced by adults with hearing loss.
Collapse
Affiliation(s)
- Katie Neal
- Department of Lingustics, Macquarie University, Sydney, NSW, Australia
| | - Catherine M McMahon
- Department of Lingustics, Macquarie University, Sydney, NSW, Australia.,Hearing, Macquarie University, Sydney, NSW, Australia
| | - Sarah E Hughes
- Centre for Patient Reported Outcome Research, Institute of Applied Health Research, University of Birmingham, Birmingham, United Kingdom.,National Institute of Health Research (NIHR), Applied Research Collaboration (ARC), West Midlands, United Kingdom.,Faculty of Medicine, Health and Life Science, Swansea University, Swansea, United Kingdom
| | - Isabelle Boisvert
- Hearing, Macquarie University, Sydney, NSW, Australia.,Sydney School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
11
|
Corcoran AW, Perera R, Koroma M, Kouider S, Hohwy J, Andrillon T. Expectations boost the reconstruction of auditory features from electrophysiological responses to noisy speech. Cereb Cortex 2022; 33:691-708. [PMID: 35253871 PMCID: PMC9890472 DOI: 10.1093/cercor/bhac094] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 02/11/2022] [Accepted: 02/12/2022] [Indexed: 02/04/2023] Open
Abstract
Online speech processing imposes significant computational demands on the listening brain, the underlying mechanisms of which remain poorly understood. Here, we exploit the perceptual "pop-out" phenomenon (i.e. the dramatic improvement of speech intelligibility after receiving information about speech content) to investigate the neurophysiological effects of prior expectations on degraded speech comprehension. We recorded electroencephalography (EEG) and pupillometry from 21 adults while they rated the clarity of noise-vocoded and sine-wave synthesized sentences. Pop-out was reliably elicited following visual presentation of the corresponding written sentence, but not following incongruent or neutral text. Pop-out was associated with improved reconstruction of the acoustic stimulus envelope from low-frequency EEG activity, implying that improvements in perceptual clarity were mediated via top-down signals that enhanced the quality of cortical speech representations. Spectral analysis further revealed that pop-out was accompanied by a reduction in theta-band power, consistent with predictive coding accounts of acoustic filling-in and incremental sentence processing. Moreover, delta-band power, alpha-band power, and pupil diameter were all increased following the provision of any written sentence information, irrespective of content. Together, these findings reveal distinctive profiles of neurophysiological activity that differentiate the content-specific processes associated with degraded speech comprehension from the context-specific processes invoked under adverse listening conditions.
Collapse
Affiliation(s)
- Andrew W Corcoran
- Corresponding author: Room E672, 20 Chancellors Walk, Clayton, VIC 3800, Australia.
| | - Ricardo Perera
- Cognition & Philosophy Laboratory, School of Philosophical, Historical, and International Studies, Monash University, Melbourne, VIC 3800 Australia
| | - Matthieu Koroma
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d’Études Cognitives, École Normale Supérieure-PSL Research University, Paris 75005, France
| | - Sid Kouider
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d’Études Cognitives, École Normale Supérieure-PSL Research University, Paris 75005, France
| | - Jakob Hohwy
- Cognition & Philosophy Laboratory, School of Philosophical, Historical, and International Studies, Monash University, Melbourne, VIC 3800 Australia,Monash Centre for Consciousness & Contemplative Studies, Monash University, Melbourne, VIC 3800 Australia
| | - Thomas Andrillon
- Monash Centre for Consciousness & Contemplative Studies, Monash University, Melbourne, VIC 3800 Australia,Paris Brain Institute, Sorbonne Université, Inserm-CNRS, Paris 75013, France
| |
Collapse
|
12
|
Non-sensory Influences on Auditory Learning and Plasticity. J Assoc Res Otolaryngol 2022; 23:151-166. [PMID: 35235100 PMCID: PMC8964851 DOI: 10.1007/s10162-022-00837-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 12/30/2021] [Indexed: 10/19/2022] Open
Abstract
Distinguishing between regular and irregular heartbeats, conversing with speakers of different accents, and tuning a guitar-all rely on some form of auditory learning. What drives these experience-dependent changes? A growing body of evidence suggests an important role for non-sensory influences, including reward, task engagement, and social or linguistic context. This review is a collection of contributions that highlight how these non-sensory factors shape auditory plasticity and learning at the molecular, physiological, and behavioral level. We begin by presenting evidence that reward signals from the dopaminergic midbrain act on cortico-subcortical networks to shape sound-evoked responses of auditory cortical neurons, facilitate auditory category learning, and modulate the long-term storage of new words and their meanings. We then discuss the role of task engagement in auditory perceptual learning and suggest that plasticity in top-down cortical networks mediates learning-related improvements in auditory cortical and perceptual sensitivity. Finally, we present data that illustrates how social experience impacts sound-evoked activity in the auditory midbrain and forebrain and how the linguistic environment rapidly shapes speech perception. These findings, which are derived from both human and animal models, suggest that non-sensory influences are important regulators of auditory learning and plasticity and are often implemented by shared neural substrates. Application of these principles could improve clinical training strategies and inform the development of treatments that enhance auditory learning in individuals with communication disorders.
Collapse
|
13
|
Abstract
The human brain exhibits the remarkable ability to categorize speech sounds into distinct, meaningful percepts, even in challenging tasks like learning non-native speech categories in adulthood and hearing speech in noisy listening conditions. In these scenarios, there is substantial variability in perception and behavior, both across individual listeners and individual trials. While there has been extensive work characterizing stimulus-related and contextual factors that contribute to variability, recent advances in neuroscience are beginning to shed light on another potential source of variability that has not been explored in speech processing. Specifically, there are task-independent, moment-to-moment variations in neural activity in broadly-distributed cortical and subcortical networks that affect how a stimulus is perceived on a trial-by-trial basis. In this review, we discuss factors that affect speech sound learning and moment-to-moment variability in perception, particularly arousal states—neurotransmitter-dependent modulations of cortical activity. We propose that a more complete model of speech perception and learning should incorporate subcortically-mediated arousal states that alter behavior in ways that are distinct from, yet complementary to, top-down cognitive modulations. Finally, we discuss a novel neuromodulation technique, transcutaneous auricular vagus nerve stimulation (taVNS), which is particularly well-suited to investigating causal relationships between arousal mechanisms and performance in a variety of perceptual tasks. Together, these approaches provide novel testable hypotheses for explaining variability in classically challenging tasks, including non-native speech sound learning.
Collapse
|
14
|
Bhandari P, Demberg V, Kray J. Semantic Predictability Facilitates Comprehension of Degraded Speech in a Graded Manner. Front Psychol 2021; 12:714485. [PMID: 34566795 PMCID: PMC8459870 DOI: 10.3389/fpsyg.2021.714485] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 08/06/2021] [Indexed: 01/02/2023] Open
Abstract
Previous studies have shown that at moderate levels of spectral degradation, semantic predictability facilitates language comprehension. It is argued that when speech is degraded, listeners have narrowed expectations about the sentence endings; i.e., semantic prediction may be limited to only most highly predictable sentence completions. The main objectives of this study were to (i) examine whether listeners form narrowed expectations or whether they form predictions across a wide range of probable sentence endings, (ii) assess whether the facilitatory effect of semantic predictability is modulated by perceptual adaptation to degraded speech, and (iii) use and establish a sensitive metric for the measurement of language comprehension. For this, we created 360 German Subject-Verb-Object sentences that varied in semantic predictability of a sentence-final target word in a graded manner (high, medium, and low) and levels of spectral degradation (1, 4, 6, and 8 channels noise-vocoding). These sentences were presented auditorily to two groups: One group (n =48) performed a listening task in an unpredictable channel context in which the degraded speech levels were randomized, while the other group (n =50) performed the task in a predictable channel context in which the degraded speech levels were blocked. The results showed that at 4 channels noise-vocoding, response accuracy was higher in high-predictability sentences than in the medium-predictability sentences, which in turn was higher than in the low-predictability sentences. This suggests that, in contrast to the narrowed expectations view, comprehension of moderately degraded speech, ranging from low- to high- including medium-predictability sentences, is facilitated in a graded manner; listeners probabilistically preactivate upcoming words from a wide range of semantic space, not limiting only to highly probable sentence endings. Additionally, in both channel contexts, we did not observe learning effects; i.e., response accuracy did not increase over the course of experiment, and response accuracy was higher in the predictable than in the unpredictable channel context. We speculate from these observations that when there is no trial-by-trial variation of the levels of speech degradation, listeners adapt to speech quality at a long timescale; however, when there is a trial-by-trial variation of the high-level semantic feature (e.g., sentence predictability), listeners do not adapt to low-level perceptual property (e.g., speech quality) at a short timescale.
Collapse
Affiliation(s)
- Pratik Bhandari
- Department of Psychology, Saarland University, Saarbrücken, Germany
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
| | - Vera Demberg
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
- Department of Computer Science, Saarland University, Saarbrücken, Germany
| | - Jutta Kray
- Department of Psychology, Saarland University, Saarbrücken, Germany
| |
Collapse
|
15
|
Zenke JK, Rahman S, Guo Q, Leung AWS, Gomaa NA. Central Processing in Tinnitus: fMRI Study Outlining Patterns of Activation Using an Auditory Discrimination Task in Normal Versus Tinnitus Patients. Otol Neurotol 2021; 42:e1170-e1180. [PMID: 34086638 DOI: 10.1097/mao.0000000000003194] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE Elucidate brain activity differences between patients with tinnitus and controls. STUDY DESIGN Cross-sectional cohort study. SETTING Outpatient Otolaryngology clinic. PATIENTS Three cohorts; 8 controls, 12 with subjective idiopathic tinnitus (tinnitus without hearing loss), and 12 with both tinnitus and hearing loss. INTERVENTION An auditory oddball identification task was performed in fMRI scanner. MAIN OUTCOME MEASURES Task performance and Tinnitus Handicap Inventory (THI) scores were recorded. Brain activation maps were generated comparing deviant and standard tones as well as at rest. One-way and two-way T-contrasts were generated in addition to multiple regression modeling which identified significant brain regions predicting tinnitus, disease severity, duration, and task performance. RESULTS Task performance worsened in tinnitus patients with increased auditory workload, in terms of additional hearing loss. THI score and grade correlated with false alarms. The limbic system, heschel's gyrus, angular gyrus and cerebellum have a significant effect on both brain behavior in patients with tinnitus, and predictability of tinnitus and its behavioral implications. CONCLUSION Increased auditory workload resulted in poorer task performance. Moreover, it is possible to predict auditory task performance in patients with tinnitus by looking at the activity of specific regions of interest. Heschl's gyrus, angular gyrus, cerebellar, and limbic system activity are important contributors to neurological activity associated with tinnitus. Finally, predictive modeling may influence further research surrounding tinnitus treatment.
Collapse
Affiliation(s)
- Julianna K Zenke
- Division of Otolaryngology Head and Neck Surgery, Department of Surgery
| | | | - Qi Guo
- Faculty of Medicine and Dentistry
| | - Ada W S Leung
- Neuroscience and Mental Health Institute
- Department of Occupational Therapy, Faculty of Rehabilitation Medicine, University of Alberta, Edmonton, Alberta, Canada
| | - Nahla A Gomaa
- Division of Otolaryngology Head and Neck Surgery, Department of Surgery
- Faculty of Medicine and Dentistry
| |
Collapse
|
16
|
Li Z, Li J, Hong B, Nolte G, Engel AK, Zhang D. Speaker-Listener Neural Coupling Reveals an Adaptive Mechanism for Speech Comprehension in a Noisy Environment. Cereb Cortex 2021; 31:4719-4729. [PMID: 33969389 DOI: 10.1093/cercor/bhab118] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 03/25/2021] [Indexed: 01/01/2023] Open
Abstract
Comprehending speech in noise is an essential cognitive skill for verbal communication. However, it remains unclear how our brain adapts to the noisy environment to achieve comprehension. The present study investigated the neural mechanisms of speech comprehension in noise using an functional near-infrared spectroscopy-based inter-brain approach. A group of speakers was invited to tell real-life stories. The recorded speech audios were added with meaningless white noise at four signal-to-noise levels and then played to listeners. Results showed that speaker-listener neural couplings of listener's left inferior frontal gyri (IFG), that is, sensorimotor system, and right middle temporal gyri (MTG), angular gyri (AG), that is, auditory system, were significantly higher in listening conditions than in the baseline. More importantly, the correlation between neural coupling of listener's left IFG and the comprehension performance gradually became more positive with increasing noise level, indicating an adaptive role of sensorimotor system in noisy speech comprehension; however, the top behavioral correlations for the coupling of listener's right MTG and AG were only obtained in mild noise conditions, indicating a different and less robust mechanism. To sum up, speaker-listener coupling analysis provides added value and new sight to understand the neural mechanism of speech-in-noise comprehension.
Collapse
Affiliation(s)
- Zhuoran Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China.,Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Jiawei Li
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China.,Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| | - Bo Hong
- Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China.,Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing 100084, China
| | - Guido Nolte
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Andreas K Engel
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Dan Zhang
- Department of Psychology, School of Social Sciences, Tsinghua University, Beijing 100084, China.,Tsinghua Laboratory of Brain and Intelligence, Tsinghua University, Beijing 100084, China
| |
Collapse
|
17
|
Borrie SA, Lansford KL. A Perceptual Learning Approach for Dysarthria Remediation: An Updated Review. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3060-3073. [PMID: 34289312 PMCID: PMC8740677 DOI: 10.1044/2021_jslhr-21-00012] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 03/15/2021] [Accepted: 03/29/2021] [Indexed: 05/19/2023]
Abstract
Purpose Early studies of perceptual learning of dysarthric speech, those summarized in Borrie, McAuliffe, and Liss (2012), yielded preliminary evidence that listeners could learn to better understand the speech of a person with dysarthria, revealing a potentially promising avenue for future intelligibility interventions. Since then, a programmatic body of research grounded in models of perceptual processing has unfolded. The current review provides an updated account of the state of the evidence in this area and offers direction for moving this work toward clinical implementation. Method The studies that have investigated perceptual learning of dysarthric speech (N = 24) are summarized and synthesized first according to the proposed learning source and then by highlighting the parameters that appear to mediate learning, culminating with additional learning outcomes. Results The recent literature has established strong empirical evidence of intelligibility improvements following familiarization with dysarthric speech and a theoretical account of the mechanisms that facilitate improved processing of the neurologically degraded acoustic signal. Conclusions There are no existing intelligibility interventions for individuals with dysarthria who cannot behaviorally modify their speech. However, there is now robust support for the development of an approach that shifts the weight of behavioral change from speaker to listener, exploiting perceptual learning to ease the intelligibility burden of dysarthria. To move this work from bench to bedside, recommendations for translational studies that establish best practices and candidacy for listener-targeted dysarthria remediation, perceptual training, are provided.
Collapse
Affiliation(s)
- Stephanie A. Borrie
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
| | - Kaitlin L. Lansford
- Department of Communication Science and Disorders, Florida State University, Tallahassee
| |
Collapse
|
18
|
Guediche S, de Bruin A, Caballero-Gaudes C, Baart M, Samuel AG. Second-language word recognition in noise: Interdependent neuromodulatory effects of semantic context and crosslinguistic interactions driven by word form similarity. Neuroimage 2021; 237:118168. [PMID: 34000398 DOI: 10.1016/j.neuroimage.2021.118168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 05/05/2021] [Accepted: 05/12/2021] [Indexed: 11/17/2022] Open
Abstract
Spoken language comprehension is a fundamental component of our cognitive skills. We are quite proficient at deciphering words from the auditory input despite the fact that the speech we hear is often masked by noise such as background babble originating from talkers other than the one we are attending to. To perceive spoken language as intended, we rely on prior linguistic knowledge and context. Prior knowledge includes all sounds and words that are familiar to a listener and depends on linguistic experience. For bilinguals, the phonetic and lexical repertoire encompasses two languages, and the degree of overlap between word forms across languages affects the degree to which they influence one another during auditory word recognition. To support spoken word recognition, listeners often rely on semantic information (i.e., the words we hear are usually related in a meaningful way). Although the number of multilinguals across the globe is increasing, little is known about how crosslinguistic effects (i.e., word overlap) interact with semantic context and affect the flexible neural systems that support accurate word recognition. The current multi-echo functional magnetic resonance imaging (fMRI) study addresses this question by examining how prime-target word pair semantic relationships interact with the target word's form similarity (cognate status) to the translation equivalent in the dominant language (L1) during accurate word recognition of a non-dominant (L2) language. We tested 26 early-proficient Spanish-Basque (L1-L2) bilinguals. When L2 targets matching L1 translation-equivalent phonological word forms were preceded by unrelated semantic contexts that drive lexical competition, a flexible language control (fronto-parietal-subcortical) network was upregulated, whereas when they were preceded by related semantic contexts that reduce lexical competition, it was downregulated. We conclude that an interplay between semantic and crosslinguistic effects regulates flexible control mechanisms of speech processing to facilitate L2 word recognition, in noise.
Collapse
Affiliation(s)
- Sara Guediche
- Basque Center on Cognition Brain, and Language, Donostia-San Sebastian 20009, Spain.
| | | | | | - Martijn Baart
- Basque Center on Cognition Brain, and Language, Donostia-San Sebastian 20009, Spain; Department of Cognitive Neuropsychology, Tilburg University, P.O. Box 90153, 5000 LE Tilburg, the Netherlands
| | - Arthur G Samuel
- Basque Center on Cognition Brain, and Language, Donostia-San Sebastian 20009, Spain; Stony Brook University, NY 11794-2500, United States; Ikerbasque Foundation, Spain
| |
Collapse
|
19
|
Zhang X, Wu YC, Holt LL. The Learning Signal in Perceptual Tuning of Speech: Bottom Up Versus Top-Down Information. Cogn Sci 2021; 45:e12947. [PMID: 33682208 DOI: 10.1111/cogs.12947] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Revised: 01/04/2021] [Accepted: 01/05/2021] [Indexed: 12/01/2022]
Abstract
Cognitive systems face a tension between stability and plasticity. The maintenance of long-term representations that reflect the global regularities of the environment is often at odds with pressure to flexibly adjust to short-term input regularities that may deviate from the norm. This tension is abundantly clear in speech communication when talkers with accents or dialects produce input that deviates from a listener's language community norms. Prior research demonstrates that when bottom-up acoustic information or top-down word knowledge is available to disambiguate speech input, there is short-term adaptive plasticity such that subsequent speech perception is shifted even in the absence of the disambiguating information. Although such effects are well-documented, it is not yet known whether bottom-up and top-down resolution of ambiguity may operate through common processes, or how these information sources may interact in guiding the adaptive plasticity of speech perception. The present study investigates the joint contributions of bottom-up information from the acoustic signal and top-down information from lexical knowledge in the adaptive plasticity of speech categorization according to short-term input regularities. The results implicate speech category activation, whether from top-down or bottom-up sources, in driving rapid adjustment of listeners' reliance on acoustic dimensions in speech categorization. Broadly, this pattern of perception is consistent with dynamic mapping of input to category representations that is flexibly tuned according to interactive processing accommodating both lexical knowledge and idiosyncrasies of the acoustic input.
Collapse
Affiliation(s)
- Xujin Zhang
- Department of Psychology, Carnegie Mellon University
| | | | - Lori L Holt
- Department of Psychology, Carnegie Mellon University
| |
Collapse
|
20
|
Abstract
OBJECTIVE Acoustic distortions to the speech signal impair spoken language recognition, but healthy listeners exhibit adaptive plasticity consistent with rapid adjustments in how the distorted speech input maps to speech representations, perhaps through engagement of supervised error-driven learning. This puts adaptive plasticity in speech perception in an interesting position with regard to developmental dyslexia inasmuch as dyslexia impacts speech processing and may involve dysfunction in neurobiological systems hypothesized to be involved in adaptive plasticity. METHOD Here, we examined typical young adult listeners (N = 17), and those with dyslexia (N = 16), as they reported the identity of native-language monosyllabic spoken words to which signal processing had been applied to create a systematic acoustic distortion. During training, all participants experienced incremental signal distortion increases to mildly distorted speech along with orthographic and auditory feedback indicating word identity following response across a brief, 250-trial training block. During pretest and posttest phases, no feedback was provided to participants. RESULTS Word recognition across severely distorted speech was poor at pretest and equivalent across groups. Training led to improved word recognition for the most severely distorted speech at posttest, with evidence that adaptive plasticity generalized to support recognition of new tokens not previously experienced under distortion. However, training-related recognition gains for listeners with dyslexia were significantly less robust than for control listeners. CONCLUSIONS Less efficient adaptive plasticity to speech distortions may impact the ability of individuals with dyslexia to deal with variability arising from sources like acoustic noise and foreign-accented speech.
Collapse
|
21
|
Rysop AU, Schmitt LM, Obleser J, Hartwigsen G. Neural modelling of the semantic predictability gain under challenging listening conditions. Hum Brain Mapp 2020; 42:110-127. [PMID: 32959939 PMCID: PMC7721236 DOI: 10.1002/hbm.25208] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Revised: 09/07/2020] [Accepted: 09/08/2020] [Indexed: 11/09/2022] Open
Abstract
When speech intelligibility is reduced, listeners exploit constraints posed by semantic context to facilitate comprehension. The left angular gyrus (AG) has been argued to drive this semantic predictability gain. Taking a network perspective, we ask how the connectivity within language-specific and domain-general networks flexibly adapts to the predictability and intelligibility of speech. During continuous functional magnetic resonance imaging (fMRI), participants repeated sentences, which varied in semantic predictability of the final word and in acoustic intelligibility. At the neural level, highly predictable sentences led to stronger activation of left-hemispheric semantic regions including subregions of the AG (PGa, PGp) and posterior middle temporal gyrus when speech became more intelligible. The behavioural predictability gain of single participants mapped onto the same regions but was complemented by increased activity in frontal and medial regions. Effective connectivity from PGa to PGp increased for more intelligible sentences. In contrast, inhibitory influence from pre-supplementary motor area to left insula was strongest when predictability and intelligibility of sentences were either lowest or highest. This interactive effect was negatively correlated with the behavioural predictability gain. Together, these results suggest that successful comprehension in noisy listening conditions relies on an interplay of semantic regions and concurrent inhibition of cognitive control regions when semantic cues are available.
Collapse
Affiliation(s)
- Anna Uta Rysop
- Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Lea-Maria Schmitt
- Department of Psychology, University of Lübeck, Lübeck, Germany.,Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany.,Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| | - Gesa Hartwigsen
- Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
22
|
Mahmud MS, Ahmed F, Al-Fahad R, Moinuddin KA, Yeasin M, Alain C, Bidelman GM. Decoding Hearing-Related Changes in Older Adults' Spatiotemporal Neural Processing of Speech Using Machine Learning. Front Neurosci 2020; 14:748. [PMID: 32765215 PMCID: PMC7378401 DOI: 10.3389/fnins.2020.00748] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 06/25/2020] [Indexed: 12/25/2022] Open
Abstract
Speech perception in noisy environments depends on complex interactions between sensory and cognitive systems. In older adults, such interactions may be affected, especially in those individuals who have more severe age-related hearing loss. Using a data-driven approach, we assessed the temporal (when in time) and spatial (where in the brain) characteristics of cortical speech-evoked responses that distinguish older adults with or without mild hearing loss. We performed source analyses to estimate cortical surface signals from the EEG recordings during a phoneme discrimination task conducted under clear and noise-degraded conditions. We computed source-level ERPs (i.e., mean activation within each ROI) from each of the 68 ROIs of the Desikan-Killiany (DK) atlas, averaged over a randomly chosen 100 trials without replacement to form feature vectors. We adopted a multivariate feature selection method called stability selection and control to choose features that are consistent over a range of model parameters. We use parameter optimized support vector machine (SVM) as a classifiers to investigate the time course and brain regions that segregate groups and speech clarity. For clear speech perception, whole-brain data revealed a classification accuracy of 81.50% [area under the curve (AUC) 80.73%; F1-score 82.00%], distinguishing groups within ∼60 ms after speech onset (i.e., as early as the P1 wave). We observed lower accuracy of 78.12% [AUC 77.64%; F1-score 78.00%] and delayed classification performance when speech was embedded in noise, with group segregation at 80 ms. Separate analysis using left (LH) and right hemisphere (RH) regions showed that LH speech activity was better at distinguishing hearing groups than activity measured in the RH. Moreover, stability selection analysis identified 12 brain regions (among 1428 total spatiotemporal features from 68 regions) where source activity segregated groups with >80% accuracy (clear speech); whereas 16 regions were critical for noise-degraded speech to achieve a comparable level of group segregation (78.7% accuracy). Our results identify critical time-courses and brain regions that distinguish mild hearing loss from normal hearing in older adults and confirm a larger number of active areas, particularly in RH, when processing noise-degraded speech information.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, The University of Memphis, Memphis, TN, United States
| | - Faruk Ahmed
- Department of Electrical and Computer Engineering, The University of Memphis, Memphis, TN, United States
| | - Rakib Al-Fahad
- Department of Electrical and Computer Engineering, The University of Memphis, Memphis, TN, United States
| | - Kazi Ashraf Moinuddin
- Department of Electrical and Computer Engineering, The University of Memphis, Memphis, TN, United States
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, The University of Memphis, Memphis, TN, United States
| | - Claude Alain
- Rotman Research Institute-Baycrest Centre for Geriatric Care, Toronto, ON, Canada.,Department of Psychology, University of Toronto, Toronto, ON, Canada.,Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States.,School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States.,Department of Anatomy and Neurobiology, University of Tennessee Health Science Center, Memphis, TN, United States
| |
Collapse
|
23
|
Ullas S, Hausfeld L, Cutler A, Eisner F, Formisano E. Neural Correlates of Phonetic Adaptation as Induced by Lexical and Audiovisual Context. J Cogn Neurosci 2020; 32:2145-2158. [PMID: 32662723 DOI: 10.1162/jocn_a_01608] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
When speech perception is difficult, one way listeners adjust is by reconfiguring phoneme category boundaries, drawing on contextual information. Both lexical knowledge and lipreading cues are used in this way, but it remains unknown whether these two differing forms of perceptual learning are similar at a neural level. This study compared phoneme boundary adjustments driven by lexical or audiovisual cues, using ultra-high-field 7-T fMRI. During imaging, participants heard exposure stimuli and test stimuli. Exposure stimuli for lexical retuning were audio recordings of words, and those for audiovisual recalibration were audio-video recordings of lip movements during utterances of pseudowords. Test stimuli were ambiguous phonetic strings presented without context, and listeners reported what phoneme they heard. Reports reflected phoneme biases in preceding exposure blocks (e.g., more reported /p/ after /p/-biased exposure). Analysis of corresponding brain responses indicated that both forms of cue use were associated with a network of activity across the temporal cortex, plus parietal, insula, and motor areas. Audiovisual recalibration also elicited significant occipital cortex activity despite the lack of visual stimuli. Activity levels in several ROIs also covaried with strength of audiovisual recalibration, with greater activity accompanying larger recalibration shifts. Similar activation patterns appeared for lexical retuning, but here, no significant ROIs were identified. Audiovisual and lexical forms of perceptual learning thus induce largely similar brain response patterns. However, audiovisual recalibration involves additional visual cortex contributions, suggesting that previously acquired visual information (on lip movements) is retrieved and deployed to disambiguate auditory perception.
Collapse
Affiliation(s)
- Shruti Ullas
- Maastricht University.,Maastricht Brain Imaging Centre
| | - Lars Hausfeld
- Maastricht University.,Maastricht Brain Imaging Centre
| | | | | | - Elia Formisano
- Maastricht University.,Maastricht Brain Imaging Centre.,Maastricht Centre for Systems Biology
| |
Collapse
|
24
|
Rotman T, Lavie L, Banai K. Rapid Perceptual Learning: A Potential Source of Individual Differences in Speech Perception Under Adverse Conditions? Trends Hear 2020; 24:2331216520930541. [PMID: 32552477 PMCID: PMC7303778 DOI: 10.1177/2331216520930541] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Challenging listening situations (e.g., when speech is rapid or noisy) result in substantial individual differences in speech perception. We propose that rapid auditory perceptual learning is one of the factors contributing to those individual differences. To explore this proposal, we assessed rapid perceptual learning of time-compressed speech in young adults with normal hearing and in older adults with age-related hearing loss. We also assessed the contribution of this learning as well as that of hearing and cognition (vocabulary, working memory, and selective attention) to the recognition of natural-fast speech (NFS; both groups) and speech in noise (younger adults). In young adults, rapid learning and vocabulary were significant predictors of NFS and speech in noise recognition. In older adults, hearing thresholds, vocabulary, and rapid learning were significant predictors of NFS recognition. In both groups, models that included learning fitted the speech data better than models that did not include learning. Therefore, under adverse conditions, rapid learning may be one of the skills listeners could employ to support speech recognition.
Collapse
Affiliation(s)
- Tali Rotman
- Department of Communication Sciences and Disorders, University of Haifa
| | - Limor Lavie
- Department of Communication Sciences and Disorders, University of Haifa
| | - Karen Banai
- Department of Communication Sciences and Disorders, University of Haifa
| |
Collapse
|
25
|
Wiener S, Lee CY. Multi-Talker Speech Promotes Greater Knowledge-Based Spoken Mandarin Word Recognition in First and Second Language Listeners. Front Psychol 2020; 11:214. [PMID: 32161560 PMCID: PMC7052525 DOI: 10.3389/fpsyg.2020.00214] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 01/30/2020] [Indexed: 11/15/2022] Open
Abstract
Spoken word recognition involves a perceptual tradeoff between the reliance on the incoming acoustic signal and knowledge about likely sound categories and their co-occurrences as words. This study examined how adult second language (L2) learners navigate between acoustic-based and knowledge-based spoken word recognition when listening to highly variable, multi-talker truncated speech, and whether this perceptual tradeoff changes as L2 listeners gradually become more proficient in their L2 after multiple months of structured classroom learning. First language (L1) Mandarin Chinese listeners and L1 English-L2 Mandarin adult listeners took part in a gating experiment. The L2 listeners were tested twice - once at the start of their intermediate/advanced L2 language class and again 2 months later. L1 listeners were only tested once. Participants were asked to identify syllable-tone words that varied in syllable token frequency (high/low according to a spoken word corpus) and syllable-conditioned tonal probability (most probable/least probable in speech given the syllable). The stimuli were recorded by 16 different talkers and presented at eight gates ranging from onset-only (gate 1) through onset +40 ms increments (gates 2 through 7) to the full word (gate 8). Mixed-effects regression modeling was used to compare performance to our previous study which used single-talker stimuli (Wiener et al., 2019). The results indicated that multi-talker speech caused both L1 and L2 listeners to rely greater on knowledge-based processing of tone. L1 listeners were able to draw on distributional knowledge of syllable-tone probabilities in early gates and switch to predominantly acoustic-based processing when more of the signal was available. In contrast, L2 listeners, with their limited experience with talker range normalization, were less able to effectively transition from probability-based to acoustic-based processing. Moreover, for the L2 listeners, the reliance on such distributional information for spoken word recognition appeared to be conditioned by the nature of the acoustic signal. Single-talker speech did not result in the same pattern of probability-based tone processing, suggesting that knowledge-based processing of L2 speech may only occur under certain acoustic conditions, such as multi-talker speech.
Collapse
Affiliation(s)
- Seth Wiener
- Language Acquisition, Processing and Pedagogy Lab, Department of Modern Languages, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Chao-Yang Lee
- Speech Processing Lab, Communication Sciences and Disorders, Ohio University, Athens, OH, United States
| |
Collapse
|
26
|
Soyman E, Vicario DS. Rapid and long-lasting improvements in neural discrimination of acoustic signals with passive familiarization. PLoS One 2019; 14:e0221819. [PMID: 31465431 PMCID: PMC6715244 DOI: 10.1371/journal.pone.0221819] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 07/29/2019] [Indexed: 12/16/2022] Open
Abstract
Sensory representations in the adult brain must undergo dynamic changes to adapt to the complexity of the external world. This study investigated how passive exposure to novel sounds modifies neural representations to facilitate recognition and discrimination, using the zebra finch model organism. The neural responses in an auditory structure in the zebra finch brain, Caudal Medial Nidopallium (NCM), undergo a long-term form of adaptation with repeated stimulus presentation, providing an excellent substrate to probe the neural underpinnings of adaptive sensory representations. In Experiment 1, electrophysiological activity in NCM was recorded under passive listening conditions as novel natural vocalizations were familiarized through playback. Neural decoding of stimuli using the temporal profiles of both single-unit and multi-unit responses improved dramatically during the first few stimulus presentations. During subsequent encounters, these signals were recognized after hearing fewer initial acoustic features. Remarkably, the accuracy of neural decoding was higher when different stimuli were heard in separate blocks compared to when they were presented randomly in a shuffled sequence. NCM neurons with narrow spike waveforms generally yielded higher neural decoding accuracy than wide spike neurons, but the rate at which these accuracies improved with passive exposure was comparable between the two neuron types. Experiment 2 supported and extended these findings by showing that the rapid gains in neural decoding of novel vocalizations with passive familiarization were long-lasting, maintained for 20 hours after the initial encounter, in multi-unit responses. Taken together, these findings provide valuable insights into the mechanisms by which the nervous system dynamically modulates sensory representations to improve discrimination of novel complex signals over short and long timescales. Similar mechanisms may also be engaged during processing of human speech signals, and thus may have potential translational relevance for elucidating the neural basis of speech comprehension difficulties.
Collapse
Affiliation(s)
- Efe Soyman
- Department of Psychology, Rutgers, the State University of New Jersey, New Brunswick, New Jersey, United States of America
- * E-mail:
| | - David S. Vicario
- Department of Psychology, Rutgers, the State University of New Jersey, New Brunswick, New Jersey, United States of America
| |
Collapse
|
27
|
Alfandari D, Vriend C, Heslenfeld DJ, Versfeld NJ, Kramer SE, Zekveld AA. Brain Volume Differences Associated With Hearing Impairment in Adults. Trends Hear 2019; 22:2331216518763689. [PMID: 29557274 PMCID: PMC5863860 DOI: 10.1177/2331216518763689] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Speech comprehension depends on the successful operation of a network of brain regions. Processing of degraded speech is associated with different patterns of brain activity in comparison with that of high-quality speech. In this exploratory study, we studied whether processing degraded auditory input in daily life because of hearing impairment is associated with differences in brain volume. We compared T1-weighted structural magnetic resonance images of 17 hearing-impaired (HI) adults with those of 17 normal-hearing (NH) controls using a voxel-based morphometry analysis. HI adults were individually matched with NH adults based on age and educational level. Gray and white matter brain volumes were compared between the groups by region-of-interest analyses in structures associated with speech processing, and by whole-brain analyses. The results suggest increased gray matter volume in the right angular gyrus and decreased white matter volume in the left fusiform gyrus in HI listeners as compared with NH ones. In the HI group, there was a significant correlation between hearing acuity and cluster volume of the gray matter cluster in the right angular gyrus. This correlation supports the link between partial hearing loss and altered brain volume. The alterations in volume may reflect the operation of compensatory mechanisms that are related to decoding meaning from degraded auditory input.
Collapse
Affiliation(s)
- Defne Alfandari
- 1 Department of Otolaryngology-Head and Neck Surgery, Section Ear & Hearing, VU University Medical Center, Amsterdam, the Netherlands.,2 Amsterdam Public Health Research Institute, VU University Medical Center, the Netherlands
| | - Chris Vriend
- 3 Department of Anatomy & Neurosciences, VU University Medical Center, Amsterdam, the Netherlands.,4 Department of Psychiatry, VU University Medical Center, Amsterdam, the Netherlands.,5 Amsterdam Neuroscience, Amsterdam, the Netherlands
| | - Dirk J Heslenfeld
- 6 Department of Psychology, VU University, Amsterdam, the Netherlands
| | - Niek J Versfeld
- 1 Department of Otolaryngology-Head and Neck Surgery, Section Ear & Hearing, VU University Medical Center, Amsterdam, the Netherlands.,2 Amsterdam Public Health Research Institute, VU University Medical Center, the Netherlands
| | - Sophia E Kramer
- 1 Department of Otolaryngology-Head and Neck Surgery, Section Ear & Hearing, VU University Medical Center, Amsterdam, the Netherlands.,2 Amsterdam Public Health Research Institute, VU University Medical Center, the Netherlands
| | - Adriana A Zekveld
- 1 Department of Otolaryngology-Head and Neck Surgery, Section Ear & Hearing, VU University Medical Center, Amsterdam, the Netherlands.,2 Amsterdam Public Health Research Institute, VU University Medical Center, the Netherlands.,7 Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, The Swedish Institute for Disability Research, Linköping University, Sweden
| |
Collapse
|
28
|
Abstract
The effects of aging and age-related hearing loss on the ability to learn degraded speech are not well understood. This study was designed to compare the perceptual learning of time-compressed speech and its generalization to natural-fast speech across young adults with normal hearing, older adults with normal hearing, and older adults with age-related hearing loss. Early learning (following brief exposure to time-compressed speech) and later learning (following further training) were compared across groups. Age and age-related hearing loss were both associated with declines in early learning. Although the two groups of older adults improved during the training session, when compared to untrained control groups (matched for age and hearing), learning was weaker in older than in young adults. Especially, the transfer of learning to untrained time-compressed sentences was reduced in both groups of older adults. Transfer of learning to natural-fast speech occurred regardless of age and hearing, but it was limited to sentences encountered during training. Findings are discussed within the framework of dynamic models of speech perception and learning. Based on this framework, we tentatively suggest that age-related declines in learning may stem from age differences in the use of high- and low-level speech cues. These age differences result in weaker early learning in older adults, which may further contribute to the difficulty to perceive speech in daily conversational settings in this population.
Collapse
Affiliation(s)
- Maayan Manheim
- 1 Department of Communication Sciences and Disorders, University of Haifa, Israel
| | - Limor Lavie
- 1 Department of Communication Sciences and Disorders, University of Haifa, Israel
| | - Karen Banai
- 1 Department of Communication Sciences and Disorders, University of Haifa, Israel
| |
Collapse
|
29
|
Zhang X, Holt LL. Simultaneous tracking of coevolving distributional regularities in speech. J Exp Psychol Hum Percept Perform 2018; 44:1760-1779. [PMID: 30272462 PMCID: PMC6205888 DOI: 10.1037/xhp0000569] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech processing depends upon mapping variable acoustic speech input in a manner that reflects the long-term regularities of the native language. Yet, these mappings are flexible such that introduction of short-term distributional regularities in speech input, like those arising from foreign accents or talker idiosyncrasies, leads to rapid adjustments in the effectiveness of acoustic dimensions in signaling phonetic categories. The present experiments investigate whether the system is able to track simultaneous short-term distributional statistics present in speech input or if, instead, the global regularity jointly defined by these distributions dominates. Three experiments establish that adult listeners are able to track distinct simultaneously evolving regularities across time, given information to support the "binning" of acoustic instances. Both voice quality and visual information to indicate talker supported tracking of coevolving distributional regularities, even when the regularities are opposing and even when the acoustic speech tokens contributing to the distinct distributions are identical. This indicates that reweighting of perceptual dimensions in response to short-term regularities in speech input is not simply an accumulation of acoustic instances. Rather, the system is able to track multiple context-sensitive regularities simultaneously, with rapid context-dependent adaptive adjustments in how acoustic speech input maps to phonetic categories. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Collapse
Affiliation(s)
- Xujin Zhang
- Department of Psychology, Carnegie Mellon University
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University
| |
Collapse
|
30
|
Gabay Y, Karni A, Banai K. Learning to decipher time-compressed speech: Robust acquisition with a slight difficulty in generalization among young adults with developmental dyslexia. PLoS One 2018; 13:e0205110. [PMID: 30356320 PMCID: PMC6200219 DOI: 10.1371/journal.pone.0205110] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 09/19/2018] [Indexed: 01/24/2023] Open
Abstract
Learning to decipher acoustically distorted speech serves as a test case for the study of language-related skill acquisition in persons with developmental dyslexia (DD). Deciphering this type of input is rarely learned explicitly and does not yield conscious insights. Problems in implicit and procedural skill learning have been proposed as possible causes of DD. Here we examined the learning of time-compressed (accelerated) speech and its generalization to novel materials among young adults with DD compared to typical readers (TD). All participants completed a training session that involved judging the semantic plausibility of sentences, during which the level of time-compression was changed using an adaptive (staircase) procedure according to each participant's performance. In the test, phase learning (test on same items) and generalization (test on new items and same items spoken by a new speaker) were assessed. Both groups showed robust gains after training. Moreover, after training, the initial disadvantage of the DD group was no longer significant. After training, both groups experienced relative difficulties in deciphering learned tokens spoken by a different voice, though participants with DD were less able to generalize the gains to deciphering new tokens. Thus, DD individuals benefited from repeated experience with time-compressed speech no less than typical readers, but their evolving skill was apparently more dependent on the specific characteristics of the tokens. Atypical generalization, which indicates that perceptual learning is contingent on lower-level features of the input though does not necessarily point to impaired learning potential per se, may explain some of the contradictory findings in published studies of speech perception in DD.
Collapse
Affiliation(s)
- Yafit Gabay
- Department of Special Education, University of Haifa, Haifa, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, Department of Learning Disabilities, University of Haifa, Haifa, Israel
| | - Avi Karni
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, Department of Learning Disabilities, University of Haifa, Haifa, Israel
- Sagol Department of Neurobiology, University of Haifa, Haifa, Israel
| | - Karen Banai
- Department of Communications Sciences and Disorders, University of Haifa, Haifa, Israel
| |
Collapse
|
31
|
Borges AFT, Giraud AL, Mansvelder HD, Linkenkaer-Hansen K. Scale-Free Amplitude Modulation of Neuronal Oscillations Tracks Comprehension of Accelerated Speech. J Neurosci 2018; 38:710-722. [PMID: 29217685 PMCID: PMC6596185 DOI: 10.1523/jneurosci.1515-17.2017] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 10/24/2017] [Accepted: 11/20/2017] [Indexed: 01/17/2023] Open
Abstract
Speech comprehension is preserved up to a threefold acceleration, but deteriorates rapidly at higher speeds. Current models posit that perceptual resilience to accelerated speech is limited by the brain's ability to parse speech into syllabic units using δ/θ oscillations. Here, we investigated whether the involvement of neuronal oscillations in processing accelerated speech also relates to their scale-free amplitude modulation as indexed by the strength of long-range temporal correlations (LRTC). We recorded MEG while 24 human subjects (12 females) listened to radio news uttered at different comprehensible rates, at a mostly unintelligible rate and at this same speed interleaved with silence gaps. δ, θ, and low-γ oscillations followed the nonlinear variation of comprehension, with LRTC rising only at the highest speed. In contrast, increasing the rate was associated with a monotonic increase in LRTC in high-γ activity. When intelligibility was restored with the insertion of silence gaps, LRTC in the δ, θ, and low-γ oscillations resumed the low levels observed for intelligible speech. Remarkably, the lower the individual subject scaling exponents of δ/θ oscillations, the greater the comprehension of the fastest speech rate. Moreover, the strength of LRTC of the speech envelope decreased at the maximal rate, suggesting an inverse relationship with the LRTC of brain dynamics when comprehension halts. Our findings show that scale-free amplitude modulation of cortical oscillations and speech signals are tightly coupled to speech uptake capacity.SIGNIFICANCE STATEMENT One may read this statement in 20-30 s, but reading it in less than five leaves us clueless. Our minds limit how much information we grasp in an instant. Understanding the neural constraints on our capacity for sensory uptake is a fundamental question in neuroscience. Here, MEG was used to investigate neuronal activity while subjects listened to radio news played faster and faster until becoming unintelligible. We found that speech comprehension is related to the scale-free dynamics of δ and θ bands, whereas this property in high-γ fluctuations mirrors speech rate. We propose that successful speech processing imposes constraints on the self-organization of synchronous cell assemblies and their scale-free dynamics adjusts to the temporal properties of spoken language.
Collapse
Affiliation(s)
- Ana Filipa Teixeira Borges
- Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, Netherlands
- Amsterdam Neuroscience, Amsterdam, Netherlands, and
| | - Anne-Lise Giraud
- Department of Neuroscience, University of Geneva, Biotech Campus, Geneva 1211, Switzerland
| | - Huibert D Mansvelder
- Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, Netherlands
- Amsterdam Neuroscience, Amsterdam, Netherlands, and
| | - Klaus Linkenkaer-Hansen
- Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, Netherlands,
- Amsterdam Neuroscience, Amsterdam, Netherlands, and
| |
Collapse
|
32
|
Hakonen M, May PJC, Jääskeläinen IP, Jokinen E, Sams M, Tiitinen H. Predictive processing increases intelligibility of acoustically distorted speech: Behavioral and neural correlates. Brain Behav 2017; 7:e00789. [PMID: 28948083 PMCID: PMC5607552 DOI: 10.1002/brb3.789] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Revised: 06/10/2017] [Accepted: 06/26/2017] [Indexed: 11/08/2022] Open
Abstract
INTRODUCTION We examined which brain areas are involved in the comprehension of acoustically distorted speech using an experimental paradigm where the same distorted sentence can be perceived at different levels of intelligibility. This change in intelligibility occurs via a single intervening presentation of the intact version of the sentence, and the effect lasts at least on the order of minutes. Since the acoustic structure of the distorted stimulus is kept fixed and only intelligibility is varied, this allows one to study brain activity related to speech comprehension specifically. METHODS In a functional magnetic resonance imaging (fMRI) experiment, a stimulus set contained a block of six distorted sentences. This was followed by the intact counterparts of the sentences, after which the sentences were presented in distorted form again. A total of 18 such sets were presented to 20 human subjects. RESULTS The blood oxygenation level dependent (BOLD)-responses elicited by the distorted sentences which came after the disambiguating, intact sentences were contrasted with the responses to the sentences presented before disambiguation. This revealed increased activity in the bilateral frontal pole, the dorsal anterior cingulate/paracingulate cortex, and the right frontal operculum. Decreased BOLD responses were observed in the posterior insula, Heschl's gyrus, and the posterior superior temporal sulcus. CONCLUSIONS The brain areas that showed BOLD-enhancement for increased sentence comprehension have been associated with executive functions and with the mapping of incoming sensory information to representations stored in episodic memory. Thus, the comprehension of acoustically distorted speech may be associated with the engagement of memory-related subsystems. Further, activity in the primary auditory cortex was modulated by prior experience, possibly in a predictive coding framework. Our results suggest that memory biases the perception of ambiguous sensory information toward interpretations that have the highest probability to be correct based on previous experience.
Collapse
Affiliation(s)
- Maria Hakonen
- Brain and Mind LaboratoryDepartment of Neuroscience and Biomedical Engineering (NBE)School of ScienceAalto UniversityAaltoFinland
- Department of PhysiologyFaculty of MedicineUniversity of HelsinkiHelsinkiFinland
| | - Patrick J. C. May
- Medical Research Council Institute of Hearing ResearchSchool of MedicineThe University of NottinghamNottinghamUK
- Special Laboratory Non‐Invasive Brain ImagingLeibniz Institute for NeurobiologyMagdeburgGermany
| | - Iiro P. Jääskeläinen
- Brain and Mind LaboratoryDepartment of Neuroscience and Biomedical Engineering (NBE)School of ScienceAalto UniversityAaltoFinland
| | - Emma Jokinen
- Department of Signal Processing and AcousticsSchool of Electrical EngineeringAalto UniversityAaltoFinland
| | - Mikko Sams
- Brain and Mind LaboratoryDepartment of Neuroscience and Biomedical Engineering (NBE)School of ScienceAalto UniversityAaltoFinland
| | | |
Collapse
|
33
|
Borrie SA, Schäfer MCM. Effects of Lexical and Somatosensory Feedback on Long-Term Improvements in Intelligibility of Dysarthric Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:2151-2158. [PMID: 28687828 DOI: 10.1044/2017_jslhr-s-16-0411] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2016] [Accepted: 02/09/2017] [Indexed: 06/07/2023]
Abstract
PURPOSE Intelligibility improvements immediately following perceptual training with dysarthric speech using lexical feedback are comparable to those observed when training uses somatosensory feedback (Borrie & Schäfer, 2015). In this study, we investigated if these lexical and somatosensory guided improvements in listener intelligibility of dysarthric speech remain comparable and stable over the course of 1 month. METHOD Following an intelligibility pretest, 60 participants were trained with dysarthric speech stimuli under one of three conditions: lexical feedback, somatosensory feedback, or no training (control). Participants then completed a series of intelligibility posttests, which took place immediately (immediate posttest), 1 week (1-week posttest) following training, and 1 month (1-month posttest) following training. RESULTS As per our previous study, intelligibility improvements at immediate posttest were equivalent between lexical and somatosensory feedback conditions. Condition differences, however, emerged over time. Improvements guided by lexical feedback deteriorated over the month whereas those guided by somatosensory feedback remained robust. CONCLUSIONS Somatosensory feedback, internally generated by vocal imitation, may be required to affect long-term perceptual gain in processing dysarthric speech. Findings are discussed in relation to underlying learning mechanisms and offer insight into how externally and internally generated feedback may differentially affect perceptual learning of disordered speech.
Collapse
Affiliation(s)
- Stephanie A Borrie
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
| | - Martina C M Schäfer
- New Zealand Institute of Language, Brain and Behaviour, University of Canterbury, Christchurch
| |
Collapse
|
34
|
Giordano BL, Ince RAA, Gross J, Schyns PG, Panzeri S, Kayser C. Contributions of local speech encoding and functional connectivity to audio-visual speech perception. eLife 2017; 6. [PMID: 28590903 PMCID: PMC5462535 DOI: 10.7554/elife.24763] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 05/07/2017] [Indexed: 11/13/2022] Open
Abstract
Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments. DOI:http://dx.doi.org/10.7554/eLife.24763.001 When listening to someone in a noisy environment, such as a cocktail party, we can understand the speaker more easily if we can also see his or her face. Movements of the lips and tongue convey additional information that helps the listener’s brain separate out syllables, words and sentences. However, exactly where in the brain this effect occurs and how it works remain unclear. To find out, Giordano et al. scanned the brains of healthy volunteers as they watched clips of people speaking. The clarity of the speech varied between clips. Furthermore, in some of the clips the lip movements of the speaker corresponded to the speech in question, whereas in others the lip movements were nonsense babble. As expected, the volunteers performed better on a word recognition task when the speech was clear and when the lips movements agreed with the spoken dialogue. Watching the video clips stimulated rhythmic activity in multiple regions of the volunteers’ brains, including areas that process sound and areas that plan movements. Speech is itself rhythmic, and the volunteers’ brain activity synchronized with the rhythms of the speech they were listening to. Seeing the speaker’s face increased this degree of synchrony. However, it also made it easier for sound-processing regions within the listeners’ brains to transfer information to one other. Notably, only the latter effect predicted improved performance on the word recognition task. This suggests that seeing a person’s face makes it easier to understand his or her speech by boosting communication between brain regions, rather than through effects on individual areas. Further work is required to determine where and how the brain encodes lip movements and speech sounds. The next challenge will be to identify where these two sets of information interact, and how the brain merges them together to generate the impression of specific words. DOI:http://dx.doi.org/10.7554/eLife.24763.002
Collapse
Affiliation(s)
- Bruno L Giordano
- Institut de Neurosciences de la Timone UMR 7289, Aix Marseille Université - Centre National de la Recherche Scientifique, Marseille, France.,Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Robin A A Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Philippe G Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Stefano Panzeri
- Neural Computation Laboratory, Center for Neuroscience and Cognitive Systems, Istituto Italiano di Tecnologia, Rovereto, Italy
| | - Christoph Kayser
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
35
|
Rogers JC, Davis MH. Inferior Frontal Cortex Contributions to the Recognition of Spoken Words and Their Constituent Speech Sounds. J Cogn Neurosci 2017; 29:919-936. [PMID: 28129061 DOI: 10.1162/jocn_a_01096] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Speech perception and comprehension are often challenged by the need to recognize speech sounds that are degraded or ambiguous. Here, we explore the cognitive and neural mechanisms involved in resolving ambiguity in the identity of speech sounds using syllables that contain ambiguous phonetic segments (e.g., intermediate sounds between /b/ and /g/ as in "blade" and "glade"). We used an audio-morphing procedure to create a large set of natural sounding minimal pairs that contain phonetically ambiguous onset or offset consonants (differing in place, manner, or voicing). These ambiguous segments occurred in different lexical contexts (i.e., in words or pseudowords, such as blade-glade or blem-glem) and in different phonological environments (i.e., with neighboring syllables that differed in lexical status, such as blouse-glouse). These stimuli allowed us to explore the impact of phonetic ambiguity on the speed and accuracy of lexical decision responses (Experiment 1), semantic categorization responses (Experiment 2), and the magnitude of BOLD fMRI responses during attentive comprehension (Experiment 3). For both behavioral and neural measures, observed effects of phonetic ambiguity were influenced by lexical context leading to slower responses and increased activity in the left inferior frontal gyrus for high-ambiguity syllables that distinguish pairs of words, but not for equivalent pseudowords. These findings suggest lexical involvement in the resolution of phonetic ambiguity. Implications for speech perception and the role of inferior frontal regions are discussed.
Collapse
Affiliation(s)
- Jack C Rogers
- MRC Cognition & Brain Sciences Unit, Cambridge, UK.,University of Birmingham
| | | |
Collapse
|
36
|
Leonard MK, Baud MO, Sjerps MJ, Chang EF. Perceptual restoration of masked speech in human cortex. Nat Commun 2016; 7:13619. [PMID: 27996973 PMCID: PMC5187421 DOI: 10.1038/ncomms13619] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2016] [Accepted: 10/19/2016] [Indexed: 02/02/2023] Open
Abstract
Humans are adept at understanding speech despite the fact that our natural listening environment is often filled with interference. An example of this capacity is phoneme restoration, in which part of a word is completely replaced by noise, yet listeners report hearing the whole word. The neurological basis for this unconscious fill-in phenomenon is unknown, despite being a fundamental characteristic of human hearing. Here, using direct cortical recordings in humans, we demonstrate that missing speech is restored at the acoustic-phonetic level in bilateral auditory cortex, in real-time. This restoration is preceded by specific neural activity patterns in a separate language area, left frontal cortex, which predicts the word that participants later report hearing. These results demonstrate that during speech perception, missing acoustic content is synthesized online from the integration of incoming sensory cues and the internal neural dynamics that bias word-level expectation and prediction. We can often ‘fill in' missing or occluded sounds from a speech signal—an effect known as phoneme restoration. Leonard et al. found a real-time restoration of the missing sounds in the superior temporal auditory cortex in humans. Interestingly, neural activity in frontal regions prior to the stimulus can predict the word that the participant would later hear.
Collapse
Affiliation(s)
- Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA.,Center for Integrative Neuroscience, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA
| | - Maxime O Baud
- Department of Neurology, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA
| | - Matthias J Sjerps
- Department of Linguistics, University of California, Berkeley, 1203 Dwinelle Hall #2650, Berkeley, California 94720-2650, USA.,Neurobiology of Language Department, Donders Institute for Brain, Cognition and Behavior, Centre for Cognitive Neuroimaging, Radboud University, Kapittelweg 29, Nijmegen 6525 EN, The Netherlands
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA.,Center for Integrative Neuroscience, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA.,Department of Physiology, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, California 94158, USA
| |
Collapse
|
37
|
Lehet M, Holt LL. Dimension-Based Statistical Learning Affects Both Speech Perception and Production. Cogn Sci 2016; 41 Suppl 4:885-912. [PMID: 27666146 DOI: 10.1111/cogs.12413] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Revised: 04/04/2016] [Accepted: 04/29/2016] [Indexed: 11/29/2022]
Abstract
Multiple acoustic dimensions signal speech categories. However, dimensions vary in their informativeness; some are more diagnostic of category membership than others. Speech categorization reflects these dimensional regularities such that diagnostic dimensions carry more "perceptual weight" and more effectively signal category membership to native listeners. Yet perceptual weights are malleable. When short-term experience deviates from long-term language norms, such as in a foreign accent, the perceptual weight of acoustic dimensions in signaling speech category membership rapidly adjusts. The present study investigated whether rapid adjustments in listeners' perceptual weights in response to speech that deviates from the norms also affects listeners' own speech productions. In a word recognition task, the correlation between two acoustic dimensions signaling consonant categories, fundamental frequency (F0) and voice onset time (VOT), matched the correlation typical of English, and then shifted to an "artificial accent" that reversed the relationship, and then shifted back. Brief, incidental exposure to the artificial accent caused participants to down-weight perceptual reliance on F0, consistent with previous research. Throughout the task, participants were intermittently prompted with pictures to produce these same words. In the block in which listeners heard the artificial accent with a reversed F0 × VOT correlation, F0 was a less robust cue to voicing in listeners' own speech productions. The statistical regularities of short-term speech input affect both speech perception and production, as evidenced via shifts in how acoustic dimensions are weighted.
Collapse
Affiliation(s)
- Matthew Lehet
- Department of Psychology and the Center for Neural Basis of Cognition, Carnegie Mellon University
| | - Lori L Holt
- Department of Psychology and the Center for Neural Basis of Cognition, Carnegie Mellon University
| |
Collapse
|
38
|
Abstract
Listeners possess a remarkable ability to adapt to acoustic variability in the realization of speech sound categories (e.g., different accents). The current work tests whether non-native listeners adapt their use of acoustic cues in phonetic categorization when they are confronted with changes in the distribution of cues in the input, as native listeners do, and examines to what extent these adaptation patterns are influenced by individual cue-weighting strategies. In line with previous work, native English listeners, who use voice onset time (VOT) as a primary cue to the stop voicing contrast (e.g., 'pa' vs. 'ba'), adjusted their use of f0 (a secondary cue to the contrast) when confronted with a noncanonical "accent" in which the two cues gave conflicting information about category membership. Native Korean listeners' adaptation strategies, while variable, were predictable based on their initial cue weighting strategies. In particular, listeners who used f0 as the primary cue to category membership adjusted their use of VOT (their secondary cue) in response to the noncanonical accent, mirroring the native pattern of "downweighting" a secondary cue. Results suggest that non-native listeners show native-like sensitivity to distributional information in the input and use this information to adjust categorization, just as native listeners do, with the specific trajectory of category adaptation governed by initial cue-weighting strategies.
Collapse
|
39
|
Guediche S, Fiez JA, Holt LL. Adaptive plasticity in speech perception: Effects of external information and internal predictions. J Exp Psychol Hum Percept Perform 2016; 42:1048-59. [PMID: 26854531 PMCID: PMC4925215 DOI: 10.1037/xhp0000196] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When listeners encounter speech under adverse listening conditions, adaptive adjustments in perception can improve comprehension over time. In some cases, these adaptive changes require the presence of external information that disambiguates the distorted speech signals, whereas in other cases mere exposure is sufficient. Both external (e.g., written feedback) and internal (e.g., prior word knowledge) sources of information can be used to generate predictions about the correct mapping of a distorted speech signal. We hypothesize that these predictions provide a basis for determining the discrepancy between the expected and actual speech signal that can be used to guide adaptive changes in perception. This study provides the first empirical investigation that manipulates external and internal factors through (a) the availability of explicit external disambiguating information via the presence or absence of postresponse orthographic information paired with a repetition of the degraded stimulus, and (b) the accuracy of internally generated predictions; an acoustic distortion is introduced either abruptly or incrementally. The results demonstrate that the impact of external information on adaptive plasticity is contingent upon whether the intelligibility of the stimuli permits accurate internally generated predictions during exposure. External information sources enhance adaptive plasticity only when input signals are severely degraded and cannot reliably access internal predictions. This is consistent with a computational framework for adaptive plasticity in which error-driven supervised learning relies on the ability to compute sensory prediction error signals from both internal and external sources of information. (PsycINFO Database Record
Collapse
Affiliation(s)
- Sara Guediche
- Center for Neuroscience at the University of Pittsburgh, Pittsburgh, PA, USA
- Center for the Neural Basis of Cognition, Pittsburgh, PA, USA
| | - Julie A. Fiez
- Center for Neuroscience at the University of Pittsburgh, Pittsburgh, PA, USA
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
- Center for the Neural Basis of Cognition, Pittsburgh, PA, USA
| | - Lori L. Holt
- Center for Neuroscience at the University of Pittsburgh, Pittsburgh, PA, USA
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA
- Center for the Neural Basis of Cognition, Pittsburgh, PA, USA
| |
Collapse
|
40
|
Kleinschmidt DF, Jaeger TF. Re-examining selective adaptation: Fatiguing feature detectors, or distributional learning? Psychon Bull Rev 2016; 23:678-91. [PMID: 26438255 PMCID: PMC4821823 DOI: 10.3758/s13423-015-0943-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When a listener hears many good examples of a /b/ in a row, they are less likely to classify other sounds on, e.g., a /b/-to-/d/ continuum as /b/. This phenomenon is known as selective adaptation and is a well-studied property of speech perception. Traditionally, selective adaptation is seen as a mechanistic property of the speech perception system, and attributed to fatigue in acoustic-phonetic feature detectors. However, recent developments in our understanding of non-linguistic sensory adaptation and higher-level adaptive plasticity in speech perception and language comprehension suggest that it is time to re-visit the phenomenon of selective adaptation. We argue that selective adaptation is better thought of as a computational property of the speech perception system. Drawing on a common thread in recent work on both non-linguistic sensory adaptation and plasticity in language comprehension, we furthermore propose that selective adaptation can be seen as a consequence of distributional learning across multiple levels of representation. This proposal opens up new questions for research on selective adaptation itself, and also suggests that selective adaptation can be an important bridge between work on adaptation in low-level sensory systems and the complicated plasticity of the adult language comprehension system.
Collapse
Affiliation(s)
- Dave F Kleinschmidt
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA.
| | - T Florian Jaeger
- Departments of Brain and Cognitive Sciences, Computer Science, and Linguistics, University of Rochester, Rochester, NY, USA
| |
Collapse
|
41
|
Moberget T, Ivry RB. Cerebellar contributions to motor control and language comprehension: searching for common computational principles. Ann N Y Acad Sci 2016; 1369:154-71. [PMID: 27206249 PMCID: PMC5260470 DOI: 10.1111/nyas.13094] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The past 25 years have seen the functional domain of the cerebellum extend beyond the realm of motor control, with considerable discussion of how this subcortical structure contributes to cognitive domains including attention, memory, and language. Drawing on evidence from neuroanatomy, physiology, neuropsychology, and computational work, sophisticated models have been developed to describe cerebellar function in sensorimotor control and learning. In contrast, mechanistic accounts of how the cerebellum contributes to cognition have remained elusive. Inspired by the homogeneous cerebellar microanatomy and a desire for parsimony, many researchers have sought to extend mechanistic ideas from motor control to cognition. One influential hypothesis centers on the idea that the cerebellum implements internal models, representations of the context-specific dynamics of an agent's interactions with the environment, enabling predictive control. We briefly review cerebellar anatomy and physiology, to review the internal model hypothesis as applied in the motor domain, before turning to extensions of these ideas in the linguistic domain, focusing on speech perception and semantic processing. While recent findings are consistent with this computational generalization, they also raise challenging questions regarding the nature of cerebellar learning, and may thus inspire revisions of our views on the role of the cerebellum in sensorimotor control.
Collapse
Affiliation(s)
- Torgeir Moberget
- Norwegian Centre for Mental Disorders Research (NORMENT), KG Jebsen Centre for Psychosis Research, Division of Mental Health and Addiction, Oslo University Hospital, Norway
| | - Richard B. Ivry
- Department of Psychology, and the Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, California
| |
Collapse
|
42
|
Distinct effects of memory retrieval and articulatory preparation when learning and accessing new word forms. PLoS One 2015; 10:e0126652. [PMID: 25961571 PMCID: PMC4427175 DOI: 10.1371/journal.pone.0126652] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2014] [Accepted: 04/04/2015] [Indexed: 12/03/2022] Open
Abstract
Temporal and frontal activations have been implicated in learning of novel word forms, but their specific roles remain poorly understood. The present magnetoencephalography (MEG) study examines the roles of these areas in processing newly-established word form representations. The cortical effects related to acquiring new phonological word forms during incidental learning were localized. Participants listened to and repeated back new word form stimuli that adhered to native phonology (Finnish pseudowords) or were foreign (Korean words), with a subset of the stimuli recurring four times. Subsequently, a modified 1-back task and a recognition task addressed whether the activations modulated by learning were related to planning for overt articulation, while parametrically added noise probed reliance on developing memory representations during effortful perception. Learning resulted in decreased left superior temporal and increased bilateral frontal premotor activation for familiar compared to new items. The left temporal learning effect persisted in all tasks and was strongest when stimuli were embedded in intermediate noise. In the noisy conditions, native phonotactics evoked overall enhanced left temporal activation. In contrast, the frontal learning effects were present only in conditions requiring overt repetition and were more pronounced for the foreign language. The results indicate a functional dissociation between temporal and frontal activations in learning new phonological word forms: the left superior temporal responses reflect activation of newly-established word-form representations, also during degraded sensory input, whereas the frontal premotor effects are related to planning for articulation and are not preserved in noise.
Collapse
|
43
|
Kleinschmidt DF, Jaeger TF. Robust speech perception: recognize the familiar, generalize to the similar, and adapt to the novel. Psychol Rev 2015; 122:148-203. [PMID: 25844873 PMCID: PMC4744792 DOI: 10.1037/a0038695] [Citation(s) in RCA: 239] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Successful speech perception requires that listeners map the acoustic signal to linguistic categories. These mappings are not only probabilistic, but change depending on the situation. For example, one talker's /p/ might be physically indistinguishable from another talker's /b/ (cf. lack of invariance). We characterize the computational problem posed by such a subjectively nonstationary world and propose that the speech perception system overcomes this challenge by (a) recognizing previously encountered situations, (b) generalizing to other situations based on previous similar experience, and (c) adapting to novel situations. We formalize this proposal in the ideal adapter framework: (a) to (c) can be understood as inference under uncertainty about the appropriate generative model for the current talker, thereby facilitating robust speech perception despite the lack of invariance. We focus on 2 critical aspects of the ideal adapter. First, in situations that clearly deviate from previous experience, listeners need to adapt. We develop a distributional (belief-updating) learning model of incremental adaptation. The model provides a good fit against known and novel phonetic adaptation data, including perceptual recalibration and selective adaptation. Second, robust speech recognition requires that listeners learn to represent the structured component of cross-situation variability in the speech signal. We discuss how these 2 aspects of the ideal adapter provide a unifying explanation for adaptation, talker-specificity, and generalization across talkers and groups of talkers (e.g., accents and dialects). The ideal adapter provides a guiding framework for future investigations into speech perception and adaptation, and more broadly language comprehension.
Collapse
Affiliation(s)
| | - T Florian Jaeger
- Departments of Brain and Cognitive Sciences, Computer Science, and Linguistics, University of Rochester
| |
Collapse
|
44
|
Wingfield A, Peelle JE. The effects of hearing loss on neural processing and plasticity. Front Syst Neurosci 2015; 9:35. [PMID: 25798095 PMCID: PMC4351590 DOI: 10.3389/fnsys.2015.00035] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2015] [Accepted: 02/19/2015] [Indexed: 11/28/2022] Open
Affiliation(s)
- Arthur Wingfield
- Volen National Center for Complex Systems, Brandeis University Waltham, MA, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. Louis St. Louis, MO, USA
| |
Collapse
|
45
|
Lim SJ, Fiez JA, Holt LL. How may the basal ganglia contribute to auditory categorization and speech perception? Front Neurosci 2014; 8:230. [PMID: 25136291 PMCID: PMC4117994 DOI: 10.3389/fnins.2014.00230] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 07/13/2014] [Indexed: 02/01/2023] Open
Abstract
Listeners must accomplish two complementary perceptual feats in extracting a message from speech. They must discriminate linguistically-relevant acoustic variability and generalize across irrelevant variability. Said another way, they must categorize speech. Since the mapping of acoustic variability is language-specific, these categories must be learned from experience. Thus, understanding how, in general, the auditory system acquires and represents categories can inform us about the toolbox of mechanisms available to speech perception. This perspective invites consideration of findings from cognitive neuroscience literatures outside of the speech domain as a means of constraining models of speech perception. Although neurobiological models of speech perception have mainly focused on cerebral cortex, research outside the speech domain is consistent with the possibility of significant subcortical contributions in category learning. Here, we review the functional role of one such structure, the basal ganglia. We examine research from animal electrophysiology, human neuroimaging, and behavior to consider characteristics of basal ganglia processing that may be advantageous for speech category learning. We also present emerging evidence for a direct role for basal ganglia in learning auditory categories in a complex, naturalistic task intended to model the incidental manner in which speech categories are acquired. To conclude, we highlight new research questions that arise in incorporating the broader neuroscience research literature in modeling speech perception, and suggest how understanding contributions of the basal ganglia can inform attempts to optimize training protocols for learning non-native speech categories in adulthood.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Psychology, Carnegie Mellon University Pittsburgh, PA, USA ; Department of Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh Pittsburgh, PA, USA
| | - Julie A Fiez
- Department of Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh Pittsburgh, PA, USA ; Department of Neuroscience, Center for Neuroscience, University of Pittsburgh Pittsburgh, PA, USA ; Department of Psychology, University of Pittsburgh Pittsburgh, PA, USA
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University Pittsburgh, PA, USA ; Department of Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh Pittsburgh, PA, USA ; Department of Neuroscience, Center for Neuroscience, University of Pittsburgh Pittsburgh, PA, USA
| |
Collapse
|