1
|
Keur-Huizinga L, Kramer SE, de Geus EJC, Zekveld AA. A Multimodal Approach to Measuring Listening Effort: A Systematic Review on the Effects of Auditory Task Demand on Physiological Measures and Their Relationship. Ear Hear 2024; 45:1089-1106. [PMID: 38880960 PMCID: PMC11325958 DOI: 10.1097/aud.0000000000001508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 03/18/2024] [Indexed: 06/18/2024]
Abstract
OBJECTIVES Listening effort involves the mental effort required to perceive an auditory stimulus, for example in noisy environments. Prolonged increased listening effort, for example due to impaired hearing ability, may increase risk of health complications. It is therefore important to identify valid and sensitive measures of listening effort. Physiological measures have been shown to be sensitive to auditory task demand manipulations and are considered to reflect changes in listening effort. Such measures include pupil dilation, alpha power, skin conductance level, and heart rate variability. The aim of the current systematic review was to provide an overview of studies to listening effort that used multiple physiological measures. The two main questions were: (1) what is the effect of changes in auditory task demand on simultaneously acquired physiological measures from various modalities? and (2) what is the relationship between the responses in these physiological measures? DESIGN Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, relevant articles were sought in PubMed, PsycInfo, and Web of Science and by examining the references of included articles. Search iterations with different combinations of psychophysiological measures were performed in conjunction with listening effort-related search terms. Quality was assessed using the Appraisal Tool for Cross-Sectional Studies. RESULTS A total of 297 articles were identified from three databases, of which 27 were included. One additional article was identified from reference lists. Of the total 28 included articles, 16 included an analysis regarding the relationship between the physiological measures. The overall quality of the included studies was reasonable. CONCLUSIONS The included studies showed that most of the physiological measures either show no effect to auditory task demand manipulations or a consistent effect in the expected direction. For example, pupil dilation increased, pre-ejection period decreased, and skin conductance level increased with increasing auditory task demand. Most of the relationships between the responses of these physiological measures were nonsignificant or weak. The physiological measures varied in their sensitivity to auditory task demand manipulations. One of the identified knowledge gaps was that the included studies mostly used tasks with high-performance levels, resulting in an underrepresentation of the physiological changes at lower performance levels. This makes it difficult to capture how the physiological responses behave across the full psychometric curve. Our results support the Framework for Understanding Effortful Listening and the need for a multimodal approach to listening effort. We furthermore discuss focus points for future studies.
Collapse
Affiliation(s)
- Laura Keur-Huizinga
- Amsterdam UMC location Vrije Universiteit Amsterdam, Otolaryngology—Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Sophia E. Kramer
- Amsterdam UMC location Vrije Universiteit Amsterdam, Otolaryngology—Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Eco J. C. de Geus
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Adriana A. Zekveld
- Amsterdam UMC location Vrije Universiteit Amsterdam, Otolaryngology—Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| |
Collapse
|
2
|
Drouin JR, Davis CP. Individual differences in visual pattern completion predict adaptation to degraded speech. BRAIN AND LANGUAGE 2024; 255:105449. [PMID: 39083999 DOI: 10.1016/j.bandl.2024.105449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 03/18/2024] [Accepted: 07/23/2024] [Indexed: 08/02/2024]
Abstract
Recognizing acoustically degraded speech relies on predictive processing whereby incomplete auditory cues are mapped to stored linguistic representations via pattern recognition processes. While listeners vary in their ability to recognize degraded speech, performance improves when a written transcription is presented, allowing completion of the partial sensory pattern to preexisting representations. Building on work characterizing predictive processing as pattern completion, we examined the relationship between domain-general pattern recognition and individual variation in degraded speech learning. Participants completed a visual pattern recognition task to measure individual-level tendency towards pattern completion. Participants were also trained to recognize noise-vocoded speech with written transcriptions and tested on speech recognition pre- and post-training using a retrieval-based transcription task. Listeners significantly improved in recognizing speech after training, and pattern completion on the visual task predicted improvement for novel items. The results implicate pattern completion as a domain-general learning mechanism that can facilitate speech adaptation in challenging contexts.
Collapse
Affiliation(s)
- Julia R Drouin
- Division of Speech and Hearing Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Communication Sciences and Disorders, California State University Fullerton, Fullerton, CA 92831, USA.
| | - Charles P Davis
- Department of Psychology & Neuroscience, Duke University, Durham, NC 27708, USA
| |
Collapse
|
3
|
MacIntyre AD, Carlyon RP, Goehring T. Neural Decoding of the Speech Envelope: Effects of Intelligibility and Spectral Degradation. Trends Hear 2024; 28:23312165241266316. [PMID: 39183533 PMCID: PMC11345737 DOI: 10.1177/23312165241266316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/23/2024] [Accepted: 06/16/2024] [Indexed: 08/27/2024] Open
Abstract
During continuous speech perception, endogenous neural activity becomes time-locked to acoustic stimulus features, such as the speech amplitude envelope. This speech-brain coupling can be decoded using non-invasive brain imaging techniques, including electroencephalography (EEG). Neural decoding may provide clinical use as an objective measure of stimulus encoding by the brain-for example during cochlear implant listening, wherein the speech signal is severely spectrally degraded. Yet, interplay between acoustic and linguistic factors may lead to top-down modulation of perception, thereby complicating audiological applications. To address this ambiguity, we assess neural decoding of the speech envelope under spectral degradation with EEG in acoustically hearing listeners (n = 38; 18-35 years old) using vocoded speech. We dissociate sensory encoding from higher-order processing by employing intelligible (English) and non-intelligible (Dutch) stimuli, with auditory attention sustained using a repeated-phrase detection task. Subject-specific and group decoders were trained to reconstruct the speech envelope from held-out EEG data, with decoder significance determined via random permutation testing. Whereas speech envelope reconstruction did not vary by spectral resolution, intelligible speech was associated with better decoding accuracy in general. Results were similar across subject-specific and group analyses, with less consistent effects of spectral degradation in group decoding. Permutation tests revealed possible differences in decoder statistical significance by experimental condition. In general, while robust neural decoding was observed at the individual and group level, variability within participants would most likely prevent the clinical use of such a measure to differentiate levels of spectral degradation and intelligibility on an individual basis.
Collapse
Affiliation(s)
| | - Robert P. Carlyon
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Tobias Goehring
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
4
|
Sweet SJ, Van Hedger SC, Batterink LJ. Of words and whistles: Statistical learning operates similarly for identical sounds perceived as speech and non-speech. Cognition 2024; 242:105649. [PMID: 37871411 DOI: 10.1016/j.cognition.2023.105649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/11/2023] [Accepted: 10/13/2023] [Indexed: 10/25/2023]
Abstract
Statistical learning is an ability that allows individuals to effortlessly extract patterns from the environment, such as sound patterns in speech. Some prior evidence suggests that statistical learning operates more robustly for speech compared to non-speech stimuli, supporting the idea that humans are predisposed to learn language. However, any apparent statistical learning advantage for speech could be driven by signal acoustics, rather than the subjective perception per se of sounds as speech. To resolve this issue, the current study assessed whether there is a statistical learning advantage for ambiguous sounds that are subjectively perceived as speech-like compared to the same sounds perceived as non-speech, thereby controlling for acoustic features. We first induced participants to perceive sine-wave speech (SWS)-a degraded form of speech not immediately perceptible as speech-as either speech or non-speech. After this induction phase, participants were exposed to a continuous stream of repeating trisyllabic nonsense words, composed of SWS syllables, and then completed an explicit familiarity rating task and an implicit target detection task to assess learning. Critically, participants showed robust and equivalent performance on both measures, regardless of their subjective speech perception. In contrast, participants who perceived the SWS syllables as more speech-like showed better detection of individual syllables embedded in speech streams. These results suggest that speech perception facilitates processing of individual sounds, but not the ability to extract patterns across sounds. Our findings suggest that statistical learning is not influenced by the perceived linguistic relevance of sounds, and that it may be conceptualized largely as an automatic, stimulus-driven mechanism.
Collapse
Affiliation(s)
- Sierra J Sweet
- Department of Psychology, Western University, London, ON, Canada.
| | - Stephen C Van Hedger
- Department of Psychology, Western University, London, ON, Canada; Western Institute for Neuroscience, Western University, London, ON, Canada; Department of Psychology, Huron University College, London, ON, Canada.
| | - Laura J Batterink
- Department of Psychology, Western University, London, ON, Canada; Western Institute for Neuroscience, Western University, London, ON, Canada.
| |
Collapse
|
5
|
Karunathilake IMD, Kulasingham JP, Simon JZ. Neural tracking measures of speech intelligibility: Manipulating intelligibility while keeping acoustics unchanged. Proc Natl Acad Sci U S A 2023; 120:e2309166120. [PMID: 38032934 PMCID: PMC10710032 DOI: 10.1073/pnas.2309166120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 10/21/2023] [Indexed: 12/02/2023] Open
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle the effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise-vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (nondegraded) version of the speech. This intermediate priming, which generates a "pop-out" percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affect acoustic and linguistic neural representations using multivariate temporal response functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. mTRFs analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex, in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD20742
- Department of Biology, University of Maryland, College Park, MD20742
- Institute for Systems Research, University of Maryland, College Park, MD20742
| |
Collapse
|
6
|
Karunathilake ID, Kulasingham JP, Simon JZ. Neural Tracking Measures of Speech Intelligibility: Manipulating Intelligibility while Keeping Acoustics Unchanged. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541269. [PMID: 37292644 PMCID: PMC10245672 DOI: 10.1101/2023.05.18.541269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Collapse
Affiliation(s)
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA
- Department of Biology, University of Maryland, College Park, MD 20742, USA
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
7
|
Cross ZR, Corcoran AW, Schlesewsky M, Kohler MJ, Bornkessel-Schlesewsky I. Oscillatory and Aperiodic Neural Activity Jointly Predict Language Learning. J Cogn Neurosci 2022; 34:1630-1649. [PMID: 35640095 DOI: 10.1162/jocn_a_01878] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Memory formation involves the synchronous firing of neurons in task-relevant networks, with recent models postulating that a decrease in low-frequency oscillatory activity underlies successful memory encoding and retrieval. However, to date, this relationship has been investigated primarily with face and image stimuli; considerably less is known about the oscillatory correlates of complex rule learning, as in language. Furthermore, recent work has shown that nonoscillatory (1/ƒ) activity is functionally relevant to cognition, yet its interaction with oscillatory activity during complex rule learning remains unknown. Using spectral decomposition and power-law exponent estimation of human EEG data (17 females, 18 males), we show for the first time that 1/ƒ and oscillatory activity jointly influence the learning of word order rules of a miniature artificial language system. Flexible word-order rules were associated with a steeper 1/ƒ slope, whereas fixed word-order rules were associated with a shallower slope. We also show that increased theta and alpha power predicts fixed relative to flexible word-order rule learning and behavioral performance. Together, these results suggest that 1/ƒ activity plays an important role in higher-order cognition, including language processing, and that grammar learning is modulated by different word-order permutations, which manifest in distinct oscillatory profiles.
Collapse
|