1
|
Polonenko MJ, Maddox RK. Fundamental frequency predominantly drives talker differences in auditory brainstem responses to continuous speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.12.603125. [PMID: 39026858 PMCID: PMC11257598 DOI: 10.1101/2024.07.12.603125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Deriving human neural responses to natural speech is now possible, but the responses to male- and female-uttered speech have been shown to differ. These talker differences may complicate interpretations or restrict experimental designs geared toward more realistic communication scenarios. This study found that when a male and female talker had the same fundamental frequency, auditory brainstem responses (ABRs) were very similar. Those responses became smaller and later with increasing fundamental frequency, as did click ABRs with increasing stimulus rates. Modeled responses suggested that the speech and click ABR differences were reasonably predicted by peripheral and brainstem processing of stimulus acoustics.
Collapse
|
2
|
Kulasingham JP, Bachmann FL, Eskelund K, Enqvist M, Innes-Brown H, Alickovic E. Predictors for estimating subcortical EEG responses to continuous speech. PLoS One 2024; 19:e0297826. [PMID: 38330068 PMCID: PMC10852227 DOI: 10.1371/journal.pone.0297826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 01/12/2024] [Indexed: 02/10/2024] Open
Abstract
Perception of sounds and speech involves structures in the auditory brainstem that rapidly process ongoing auditory stimuli. The role of these structures in speech processing can be investigated by measuring their electrical activity using scalp-mounted electrodes. However, typical analysis methods involve averaging neural responses to many short repetitive stimuli that bear little relevance to daily listening environments. Recently, subcortical responses to more ecologically relevant continuous speech were detected using linear encoding models. These methods estimate the temporal response function (TRF), which is a regression model that minimises the error between the measured neural signal and a predictor derived from the stimulus. Using predictors that model the highly non-linear peripheral auditory system may improve linear TRF estimation accuracy and peak detection. Here, we compare predictors from both simple and complex peripheral auditory models for estimating brainstem TRFs on electroencephalography (EEG) data from 24 participants listening to continuous speech. We also investigate the data length required for estimating subcortical TRFs, and find that around 12 minutes of data is sufficient for clear wave V peaks (>3 dB SNR) to be seen in nearly all participants. Interestingly, predictors derived from simple filterbank-based models of the peripheral auditory system yield TRF wave V peak SNRs that are not significantly different from those estimated using a complex model of the auditory nerve, provided that the nonlinear effects of adaptation in the auditory system are appropriately modelled. Crucially, computing predictors from these simpler models is more than 50 times faster compared to the complex model. This work paves the way for efficient modelling and detection of subcortical processing of continuous speech, which may lead to improved diagnosis metrics for hearing impairment and assistive hearing technology.
Collapse
Affiliation(s)
- Joshua P. Kulasingham
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | | | | | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Hamish Innes-Brown
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Emina Alickovic
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
- Eriksholm Research Centre, Snekkersten, Denmark
| |
Collapse
|
3
|
Shan T, Cappelloni MS, Maddox RK. Subcortical responses to music and speech are alike while cortical responses diverge. Sci Rep 2024; 14:789. [PMID: 38191488 PMCID: PMC10774448 DOI: 10.1038/s41598-023-50438-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 12/20/2023] [Indexed: 01/10/2024] Open
Abstract
Music and speech are encountered daily and are unique to human beings. Both are transformed by the auditory pathway from an initial acoustical encoding to higher level cognition. Studies of cortex have revealed distinct brain responses to music and speech, but differences may emerge in the cortex or may be inherited from different subcortical encoding. In the first part of this study, we derived the human auditory brainstem response (ABR), a measure of subcortical encoding, to recorded music and speech using two analysis methods. The first method, described previously and acoustically based, yielded very different ABRs between the two sound classes. The second method, however, developed here and based on a physiological model of the auditory periphery, gave highly correlated responses to music and speech. We determined the superiority of the second method through several metrics, suggesting there is no appreciable impact of stimulus class (i.e., music vs speech) on the way stimulus acoustics are encoded subcortically. In this study's second part, we considered the cortex. Our new analysis method resulted in cortical music and speech responses becoming more similar but with remaining differences. The subcortical and cortical results taken together suggest that there is evidence for stimulus-class dependent processing of music and speech at the cortical but not subcortical level.
Collapse
Affiliation(s)
- Tong Shan
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
- Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Madeline S Cappelloni
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
- Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Ross K Maddox
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA.
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA.
- Center for Visual Science, University of Rochester, Rochester, NY, USA.
- Department of Neuroscience, University of Rochester, Rochester, NY, USA.
| |
Collapse
|
4
|
Bachmann FL, Kulasingham JP, Eskelund K, Enqvist M, Alickovic E, Innes-Brown H. Extending Subcortical EEG Responses to Continuous Speech to the Sound-Field. Trends Hear 2024; 28:23312165241246596. [PMID: 38738341 DOI: 10.1177/23312165241246596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2024] Open
Abstract
The auditory brainstem response (ABR) is a valuable clinical tool for objective hearing assessment, which is conventionally detected by averaging neural responses to thousands of short stimuli. Progressing beyond these unnatural stimuli, brainstem responses to continuous speech presented via earphones have been recently detected using linear temporal response functions (TRFs). Here, we extend earlier studies by measuring subcortical responses to continuous speech presented in the sound-field, and assess the amount of data needed to estimate brainstem TRFs. Electroencephalography (EEG) was recorded from 24 normal hearing participants while they listened to clicks and stories presented via earphones and loudspeakers. Subcortical TRFs were computed after accounting for non-linear processing in the auditory periphery by either stimulus rectification or an auditory nerve model. Our results demonstrated that subcortical responses to continuous speech could be reliably measured in the sound-field. TRFs estimated using auditory nerve models outperformed simple rectification, and 16 minutes of data was sufficient for the TRFs of all participants to show clear wave V peaks for both earphones and sound-field stimuli. Subcortical TRFs to continuous speech were highly consistent in both earphone and sound-field conditions, and with click ABRs. However, sound-field TRFs required slightly more data (16 minutes) to achieve clear wave V peaks compared to earphone TRFs (12 minutes), possibly due to effects of room acoustics. By investigating subcortical responses to sound-field speech stimuli, this study lays the groundwork for bringing objective hearing assessment closer to real-life conditions, which may lead to improved hearing evaluations and smart hearing technologies.
Collapse
Affiliation(s)
| | - Joshua P Kulasingham
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | | | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Emina Alickovic
- Eriksholm Research Centre, Snekkersten, Denmark
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Hamish Innes-Brown
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
5
|
Schmidt F, Chen Y, Keitel A, Rösch S, Hannemann R, Serman M, Hauswald A, Weisz N. Neural speech tracking shifts from the syllabic to the modulation rate of speech as intelligibility decreases. Psychophysiology 2023; 60:e14362. [PMID: 37350379 PMCID: PMC10909526 DOI: 10.1111/psyp.14362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 04/24/2023] [Accepted: 05/10/2023] [Indexed: 06/24/2023]
Abstract
The most prominent acoustic features in speech are intensity modulations, represented by the amplitude envelope of speech. Synchronization of neural activity with these modulations supports speech comprehension. As the acoustic modulation of speech is related to the production of syllables, investigations of neural speech tracking commonly do not distinguish between lower-level acoustic (envelope modulation) and higher-level linguistic (syllable rate) information. Here we manipulated speech intelligibility using noise-vocoded speech and investigated the spectral dynamics of neural speech processing, across two studies at cortical and subcortical levels of the auditory hierarchy, using magnetoencephalography. Overall, cortical regions mostly track the syllable rate, whereas subcortical regions track the acoustic envelope. Furthermore, with less intelligible speech, tracking of the modulation rate becomes more dominant. Our study highlights the importance of distinguishing between envelope modulation and syllable rate and provides novel possibilities to better understand differences between auditory processing and speech/language processing disorders.
Collapse
Affiliation(s)
- Fabian Schmidt
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Ya‐Ping Chen
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Anne Keitel
- Psychology, School of Social SciencesUniversity of DundeeDundeeUK
| | - Sebastian Rösch
- Department of OtorhinolaryngologyParacelsus Medical UniversitySalzburgAustria
| | | | - Maja Serman
- Audiological Research UnitSivantos GmbHErlangenGermany
| | - Anne Hauswald
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
| | - Nathan Weisz
- Center for Cognitive NeuroscienceUniversity of SalzburgSalzburgAustria
- Department of PsychologyUniversity of SalzburgSalzburgAustria
- Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical UniversitySalzburgAustria
| |
Collapse
|
6
|
Xie X, Jaeger TF, Kurumada C. What we do (not) know about the mechanisms underlying adaptive speech perception: A computational framework and review. Cortex 2023; 166:377-424. [PMID: 37506665 DOI: 10.1016/j.cortex.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 12/23/2022] [Accepted: 05/05/2023] [Indexed: 07/30/2023]
Abstract
Speech from unfamiliar talkers can be difficult to comprehend initially. These difficulties tend to dissipate with exposure, sometimes within minutes or less. Adaptivity in response to unfamiliar input is now considered a fundamental property of speech perception, and research over the past two decades has made substantial progress in identifying its characteristics. The mechanisms underlying adaptive speech perception, however, remain unknown. Past work has attributed facilitatory effects of exposure to any one of three qualitatively different hypothesized mechanisms: (1) low-level, pre-linguistic, signal normalization, (2) changes in/selection of linguistic representations, or (3) changes in post-perceptual decision-making. Direct comparisons of these hypotheses, or combinations thereof, have been lacking. We describe a general computational framework for adaptive speech perception (ASP) that-for the first time-implements all three mechanisms. We demonstrate how the framework can be used to derive predictions for experiments on perception from the acoustic properties of the stimuli. Using this approach, we find that-at the level of data analysis presently employed by most studies in the field-the signature results of influential experimental paradigms do not distinguish between the three mechanisms. This highlights the need for a change in research practices, so that future experiments provide more informative results. We recommend specific changes to experimental paradigms and data analysis. All data and code for this study are shared via OSF, including the R markdown document that this article is generated from, and an R library that implements the models we present.
Collapse
Affiliation(s)
- Xin Xie
- Language Science, University of California, Irvine, USA.
| | - T Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA; Computer Science, University of Rochester, Rochester, NY, USA
| | - Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA
| |
Collapse
|
7
|
Ignatious E, Azam S, Jonkman M, De Boer F. Binaural masking level difference for pure tone signals. J Otol 2023; 18:160-167. [PMID: 37497326 PMCID: PMC10366637 DOI: 10.1016/j.joto.2023.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 06/11/2023] [Indexed: 07/28/2023] Open
Abstract
The binaural masking level difference (BMLD) is a psychoacoustic method to determine binaural interaction and central auditory processes. The BMLD is the difference in hearing thresholds in homophasic and antiphasic conditions. The duration, phase and frequency of the stimuli can affect the BMLD. The main aim of the study is to evaluate the BMLD for stimuli of different durations and frequencies which could also be used in future electrophysiological studies. To this end we developed a GUI to present different frequency signals of variable duration and determine the BMLD. Three different durations and five different frequencies are explored. The results of the study confirm that the hearing threshold for the antiphasic condition is lower than the hearing threshold for the homophasic condition and that differences are significant for signals of 18ms and 48ms duration. Future objective binaural processing studies will be based on 18ms and 48ms stimuli with the same frequencies as used in the current study.
Collapse
|
8
|
Lindboom E, Nidiffer A, Carney LH, Lalor EC. Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. Hear Res 2023; 433:108767. [PMID: 37060895 PMCID: PMC10559335 DOI: 10.1016/j.heares.2023.108767] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/29/2023] [Accepted: 04/09/2023] [Indexed: 04/17/2023]
Abstract
The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyzes resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.
Collapse
Affiliation(s)
- Elsa Lindboom
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA
| | - Aaron Nidiffer
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Laurel H Carney
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA.
| | - Edmund C Lalor
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| |
Collapse
|
9
|
Incorporating models of subcortical processing improves the ability to predict EEG responses to natural speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.02.522438. [PMID: 36711934 PMCID: PMC9881851 DOI: 10.1101/2023.01.02.522438] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The goal of describing how the human brain responds to complex acoustic stimuli has driven auditory neuroscience research for decades. Often, a systems-based approach has been taken, in which neurophysiological responses are modeled based on features of the presented stimulus. This includes a wealth of work modeling electroencephalogram (EEG) responses to complex acoustic stimuli such as speech. Examples of the acoustic features used in such modeling include the amplitude envelope and spectrogram of speech. These models implicitly assume a direct mapping from stimulus representation to cortical activity. However, in reality, the representation of sound is transformed as it passes through early stages of the auditory pathway, such that inputs to the cortex are fundamentally different from the raw audio signal that was presented. Thus, it could be valuable to account for the transformations taking place in lower-order auditory areas, such as the auditory nerve, cochlear nucleus, and inferior colliculus (IC) when predicting cortical responses to complex sounds. Specifically, because IC responses are more similar to cortical inputs than acoustic features derived directly from the audio signal, we hypothesized that linear mappings (temporal response functions; TRFs) fit to the outputs of an IC model would better predict EEG responses to speech stimuli. To this end, we modeled responses to the acoustic stimuli as they passed through the auditory nerve, cochlear nucleus, and inferior colliculus before fitting a TRF to the output of the modeled IC responses. Results showed that using model-IC responses in traditional systems analyses resulted in better predictions of EEG activity than using the envelope or spectrogram of a speech stimulus. Further, it was revealed that model-IC derived TRFs predict different aspects of the EEG than acoustic-feature TRFs, and combining both types of TRF models provides a more accurate prediction of the EEG response.x.
Collapse
|
10
|
Simon JZ, Commuri V, Kulasingham JP. Time-locked auditory cortical responses in the high-gamma band: A window into primary auditory cortex. Front Neurosci 2022; 16:1075369. [PMID: 36570848 PMCID: PMC9773383 DOI: 10.3389/fnins.2022.1075369] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 11/24/2022] [Indexed: 12/13/2022] Open
Abstract
Primary auditory cortex is a critical stage in the human auditory pathway, a gateway between subcortical and higher-level cortical areas. Receiving the output of all subcortical processing, it sends its output on to higher-level cortex. Non-invasive physiological recordings of primary auditory cortex using electroencephalography (EEG) and magnetoencephalography (MEG), however, may not have sufficient specificity to separate responses generated in primary auditory cortex from those generated in underlying subcortical areas or neighboring cortical areas. This limitation is important for investigations of effects of top-down processing (e.g., selective-attention-based) on primary auditory cortex: higher-level areas are known to be strongly influenced by top-down processes, but subcortical areas are often assumed to perform strictly bottom-up processing. Fortunately, recent advances have made it easier to isolate the neural activity of primary auditory cortex from other areas. In this perspective, we focus on time-locked responses to stimulus features in the high gamma band (70-150 Hz) and with early cortical latency (∼40 ms), intermediate between subcortical and higher-level areas. We review recent findings from physiological studies employing either repeated simple sounds or continuous speech, obtaining either a frequency following response (FFR) or temporal response function (TRF). The potential roles of top-down processing are underscored, and comparisons with invasive intracranial EEG (iEEG) and animal model recordings are made. We argue that MEG studies employing continuous speech stimuli may offer particular benefits, in that only a few minutes of speech generates robust high gamma responses from bilateral primary auditory cortex, and without measurable interference from subcortical or higher-level areas.
Collapse
Affiliation(s)
- Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States,Department of Biology, University of Maryland, College Park, College Park, MD, United States,Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States,*Correspondence: Jonathan Z. Simon,
| | - Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
| | | |
Collapse
|
11
|
Shan T, Wenner CE, Xu C, Duan Z, Maddox RK. Speech-In-Noise Comprehension is Improved When Viewing a Deep-Neural-Network-Generated Talking Face. Trends Hear 2022; 26:23312165221136934. [PMID: 36384325 PMCID: PMC9677167 DOI: 10.1177/23312165221136934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Listening in a noisy environment is challenging, but many previous studies have demonstrated that comprehension of speech can be substantially improved by looking at the talker's face. We recently developed a deep neural network (DNN) based system that generates movies of a talking face from speech audio and a single face image. In this study, we aimed to quantify the benefits that such a system can bring to speech comprehension, especially in noise. The target speech audio was masked with signal to noise ratios of -9, -6, -3, and 0 dB and was presented to subjects in three audio-visual (AV) stimulus conditions: (1) synthesized AV: audio with the synthesized talking face movie; (2) natural AV: audio with the original movie from the corpus; and (3) audio-only: audio with a static image of the talker. Subjects were asked to type the sentences they heard in each trial and keyword recognition was quantified for each condition. Overall, performance in the synthesized AV condition fell approximately halfway between the other two conditions, showing a marked improvement over the audio-only control but still falling short of the natural AV condition. Every subject showed some benefit from the synthetic AV stimulus. The results of this study support the idea that a DNN-based model that generates a talking face from speech audio can meaningfully enhance comprehension in noisy environments, and has the potential to be used as a visual hearing aid.
Collapse
Affiliation(s)
- Tong Shan
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA,Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA,Center for Visual Science, University of Rochester, Rochester, NY, USA
| | - Casper E. Wenner
- Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA
| | - Chenliang Xu
- Department of Computer Science, University of Rochester, Rochester, NY, USA
| | - Zhiyao Duan
- Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY, USA
| | - Ross K. Maddox
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA,Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA,Center for Visual Science, University of Rochester, Rochester, NY, USA,Department of Neuroscience, University of Rochester, Rochester, NY, USA,Ross K. Maddox, Department of Biomedical Engineering and Department of Neuroscience, University of Rochester, Rochester, NY, USA.
| |
Collapse
|
12
|
Winn MB, Wright RA. Reconsidering commonly used stimuli in speech perception experiments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1394. [PMID: 36182291 DOI: 10.1121/10.0013415] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 07/18/2022] [Indexed: 06/16/2023]
Abstract
This paper examines some commonly used stimuli in speech perception experiments and raises questions about their use, or about the interpretations of previous results. The takeaway messages are: 1) the Hillenbrand vowels represent a particular dialect rather than a gold standard, and English vowels contain spectral dynamics that have been largely underappreciated, 2) the /ɑ/ context is very common but not clearly superior as a context for testing consonant perception, 3) /ɑ/ is particularly problematic when testing voice-onset-time perception because it introduces strong confounds in the formant transitions, 4) /dɑ/ is grossly overrepresented in neurophysiological studies and yet is insufficient as a generalized proxy for "speech perception," and 5) digit tests and matrix sentences including the coordinate response measure are systematically insensitive to important patterns in speech perception. Each of these stimulus sets and concepts is described with careful attention to their unique value and also cases where they might be misunderstood or over-interpreted.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Richard A Wright
- Department of Linguistics, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
13
|
Teichert T, Gnanateja GN, Sadagopan S, Chandrasekaran B. A Linear Superposition Model of Envelope and Frequency Following Responses May Help Identify Generators Based on Latency. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2022; 3:441-468. [PMID: 36909931 PMCID: PMC10003646 DOI: 10.1162/nol_a_00072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Envelope and frequency-following responses (FFRENV and FFRTFS) are scalp-recorded electrophysiological potentials that closely follow the periodicity of complex sounds such as speech. These signals have been established as important biomarkers in speech and learning disorders. However, despite important advances, it has remained challenging to map altered FFRENV and FFRTFS to altered processing in specific brain regions. Here we explore the utility of a deconvolution approach based on the assumption that FFRENV and FFRTFS reflect the linear superposition of responses that are triggered by the glottal pulse in each cycle of the fundamental frequency (F0 responses). We tested the deconvolution method by applying it to FFRENV and FFRTFS of rhesus monkeys to human speech and click trains with time-varying pitch patterns. Our analyses show that F0ENV responses could be measured with high signal-to-noise ratio and featured several spectro-temporally and topographically distinct components that likely reflect the activation of brainstem (<5 ms; 200-1000 Hz), midbrain (5-15 ms; 100-250 Hz), and cortex (15-35 ms; ~90 Hz). In contrast, F0TFS responses contained only one spectro-temporal component that likely reflected activity in the midbrain. In summary, our results support the notion that the latency of F0 components map meaningfully onto successive processing stages. This opens the possibility that pathologically altered FFRENV or FFRTFS may be linked to altered F0ENV or F0TFS and from there to specific processing stages and ultimately spatially targeted interventions.
Collapse
Affiliation(s)
- Tobias Teichert
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA
| | - G. Nike Gnanateja
- Department of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, PA, USA
| | - Srivatsun Sadagopan
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bharath Chandrasekaran
- Department of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
14
|
Polonenko MJ, Maddox RK. Optimizing Parameters for Using the Parallel Auditory Brainstem Response to Quickly Estimate Hearing Thresholds. Ear Hear 2022; 43:646-658. [PMID: 34593686 PMCID: PMC8881303 DOI: 10.1097/aud.0000000000001128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Timely assessments are critical to providing early intervention and better hearing and spoken language outcomes for children with hearing loss. To facilitate faster diagnostic hearing assessments in infants, the authors developed the parallel auditory brainstem response (pABR), which presents randomly timed trains of tone pips at five frequencies to each ear simultaneously. The pABR yields high-quality waveforms that are similar to the standard, single-frequency serial ABR but in a fraction of the recording time. While well-documented for standard ABRs, it is yet unknown how presentation rate and level interact to affect responses collected in parallel. Furthermore, the stimuli are yet to be calibrated to perceptual thresholds. Therefore, this study aimed to determine the optimal range of parameters for the pABR and to establish the normative stimulus level correction values for the ABR stimuli. DESIGN Two experiments were completed, each with a group of 20 adults (18-35 years old) with normal-hearing thresholds (≤20 dB HL) from 250 to 8000 Hz. First, pABR electroencephalographic (EEG) responses were recorded for six stimulation rates and two intensities. The changes in component wave V amplitude and latency were analyzed, as well as the time required for all responses to reach a criterion signal-to-noise ratio of 0 dB. Second, behavioral thresholds were measured for pure tones and for the pABR stimuli at each rate to determine the correction factors that relate the stimulus level in dB peSPL to perceptual thresholds in dB nHL. RESULTS The pABR showed some adaptation with increased stimulation rate. A wide range of rates yielded robust responses in under 15 minutes, but 40 Hz was the optimal singular presentation rate. Extending the analysis window to include later components of the response offered further time-saving advantages for the temporally broader responses to low-frequency tone pips. The perceptual thresholds to pABR stimuli changed subtly with rate, giving a relatively similar set of correction factors to convert the level of the pABR stimuli from dB peSPL to dB nHL. CONCLUSIONS The optimal stimulation rate for the pABR is 40 Hz but using multiple rates may prove useful. Perceptual thresholds that subtly change across rate allow for a testing paradigm that easily transitions between rates, which may be useful for quickly estimating thresholds for different configurations of hearing loss. These optimized parameters facilitate expediency and effectiveness of the pABR to estimate hearing thresholds in a clinical setting.
Collapse
Affiliation(s)
- Melissa J Polonenko
- Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, NY, USA
- Center for Visual Sciences, University of Rochester, NY, USA
| | - Ross K Maddox
- Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, NY, USA
- Department of Biomedical Engineering, University of Rochester, NY, USA
- Center for Visual Sciences, University of Rochester, NY, USA
| |
Collapse
|
15
|
Irsik VC, Johnsrude IS, Herrmann B. Age-related deficits in dip-listening evident for isolated sentences but not for spoken stories. Sci Rep 2022; 12:5898. [PMID: 35393472 PMCID: PMC8991280 DOI: 10.1038/s41598-022-09805-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 03/23/2022] [Indexed: 12/03/2022] Open
Abstract
Fluctuating background sounds facilitate speech intelligibility by providing speech ‘glimpses’ (masking release). Older adults benefit less from glimpses, but masking release is typically investigated using isolated sentences. Recent work indicates that using engaging, continuous speech materials (e.g., spoken stories) may qualitatively alter speech-in-noise listening. Moreover, neural sensitivity to different amplitude envelope profiles (ramped, damped) changes with age, but whether this affects speech listening is unknown. In three online experiments, we investigate how masking release in younger and older adults differs for masked sentences and stories, and how speech intelligibility varies with masker amplitude profile. Intelligibility was generally greater for damped than ramped maskers. Masking release was reduced in older relative to younger adults for disconnected sentences, and stories with a randomized sentence order. Critically, when listening to stories with an engaging and coherent narrative, older adults demonstrated equal or greater masking release compared to younger adults. Older adults thus appear to benefit from ‘glimpses’ as much as, or more than, younger adults when the speech they are listening to follows a coherent topical thread. Our results highlight the importance of cognitive and motivational factors for speech understanding, and suggest that previous work may have underestimated speech-listening abilities in older adults.
Collapse
Affiliation(s)
- Vanessa C Irsik
- Department of Psychology & The Brain and Mind Institute, The University of Western Ontario, London, ON, N6A 3K7, Canada.
| | - Ingrid S Johnsrude
- Department of Psychology & The Brain and Mind Institute, The University of Western Ontario, London, ON, N6A 3K7, Canada.,School of Communication and Speech Disorders, The University of Western Ontario, London, ON, N6A 5B7, Canada
| | - Björn Herrmann
- Department of Psychology & The Brain and Mind Institute, The University of Western Ontario, London, ON, N6A 3K7, Canada.,Rotman Research Institute, Baycrest, Toronto, ON, M6A 2E1, Canada.,Department of Psychology, University of Toronto, Toronto, ON, M5S 1A1, Canada
| |
Collapse
|
16
|
Irsik VC, Johnsrude IS, Herrmann B. Neural Activity during Story Listening Is Synchronized across Individuals Despite Acoustic Masking. J Cogn Neurosci 2022; 34:933-950. [PMID: 35258555 DOI: 10.1162/jocn_a_01842] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Older people with hearing problems often experience difficulties understanding speech in the presence of background sound. As a result, they may disengage in social situations, which has been associated with negative psychosocial health outcomes. Measuring listening (dis)engagement during challenging listening situations has received little attention thus far. We recruit young, normal-hearing human adults (both sexes) and investigate how speech intelligibility and engagement during naturalistic story listening is affected by the level of acoustic masking (12-talker babble) at different signal-to-noise ratios (SNRs). In Experiment 1, we observed that word-report scores were above 80% for all but the lowest SNR (-3 dB SNR) we tested, at which performance dropped to 54%. In Experiment 2, we calculated intersubject correlation (ISC) using EEG data to identify dynamic spatial patterns of shared neural activity evoked by the stories. ISC has been used as a neural measure of participants' engagement with naturalistic materials. Our results show that ISC was stable across all but the lowest SNRs, despite reduced speech intelligibility. Comparing ISC and intelligibility demonstrated that word-report performance declined more strongly with decreasing SNR compared to ISC. Our measure of neural engagement suggests that individuals remain engaged in story listening despite missing words because of background noise. Our work provides a potentially fruitful approach to investigate listener engagement with naturalistic, spoken stories that may be used to investigate (dis)engagement in older adults with hearing impairment.
Collapse
Affiliation(s)
| | | | - Björn Herrmann
- The University of Western Ontario.,Rotman Research Institute, Toronto, ON, Canada.,University of Toronto
| |
Collapse
|
17
|
Bachmann FL, MacDonald EN, Hjortkjær J. Neural Measures of Pitch Processing in EEG Responses to Running Speech. Front Neurosci 2022; 15:738408. [PMID: 35002597 PMCID: PMC8729880 DOI: 10.3389/fnins.2021.738408] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 11/01/2021] [Indexed: 11/13/2022] Open
Abstract
Linearized encoding models are increasingly employed to model cortical responses to running speech. Recent extensions to subcortical responses suggest clinical perspectives, potentially complementing auditory brainstem responses (ABRs) or frequency-following responses (FFRs) that are current clinical standards. However, while it is well-known that the auditory brainstem responds both to transient amplitude variations and the stimulus periodicity that gives rise to pitch, these features co-vary in running speech. Here, we discuss challenges in disentangling the features that drive the subcortical response to running speech. Cortical and subcortical electroencephalographic (EEG) responses to running speech from 19 normal-hearing listeners (12 female) were analyzed. Using forward regression models, we confirm that responses to the rectified broadband speech signal yield temporal response functions consistent with wave V of the ABR, as shown in previous work. Peak latency and amplitude of the speech-evoked brainstem response were correlated with standard click-evoked ABRs recorded at the vertex electrode (Cz). Similar responses could be obtained using the fundamental frequency (F0) of the speech signal as model predictor. However, simulations indicated that dissociating responses to temporal fine structure at the F0 from broadband amplitude variations is not possible given the high co-variance of the features and the poor signal-to-noise ratio (SNR) of subcortical EEG responses. In cortex, both simulations and data replicated previous findings indicating that envelope tracking on frontal electrodes can be dissociated from responses to slow variations in F0 (relative pitch). Yet, no association between subcortical F0-tracking and cortical responses to relative pitch could be detected. These results indicate that while subcortical speech responses are comparable to click-evoked ABRs, dissociating pitch-related processing in the auditory brainstem may be challenging with natural speech stimuli.
Collapse
Affiliation(s)
- Florine L Bachmann
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Ewen N MacDonald
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
| | - Jens Hjortkjær
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.,Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital - Amager and Hvidovre, Copenhagen, Denmark
| |
Collapse
|