1
|
Saddler MR, McDermott JH. Models optimized for real-world tasks reveal the task-dependent necessity of precise temporal coding in hearing. Nat Commun 2024; 15:10590. [PMID: 39632854 PMCID: PMC11618365 DOI: 10.1038/s41467-024-54700-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 11/18/2024] [Indexed: 12/07/2024] Open
Abstract
Neurons encode information in the timing of their spikes in addition to their firing rates. Spike timing is particularly precise in the auditory nerve, where action potentials phase lock to sound with sub-millisecond precision, but its behavioral relevance remains uncertain. We optimized machine learning models to perform real-world hearing tasks with simulated cochlear input, assessing the precision of auditory nerve spike timing needed to reproduce human behavior. Models with high-fidelity phase locking exhibited more human-like sound localization and speech perception than models without, consistent with an essential role in human hearing. However, the temporal precision needed to reproduce human-like behavior varied across tasks, as did the precision that benefited real-world task performance. These effects suggest that perceptual domains incorporate phase locking to different extents depending on the demands of real-world hearing. The results illustrate how optimizing models for realistic tasks can clarify the role of candidate neural codes in perception.
Collapse
Affiliation(s)
- Mark R Saddler
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA.
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA.
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA.
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA.
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA.
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, USA.
- Program in Speech and Hearing Biosciences and Technology, Harvard, Cambridge, MA, USA.
| |
Collapse
|
2
|
Wei L, Verschooten E, Joris PX. Enhancement of phase-locking in rodents. II. An axonal recording study in chinchilla. J Neurophysiol 2023; 130:751-767. [PMID: 37609701 DOI: 10.1152/jn.00474.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 08/07/2023] [Accepted: 08/15/2023] [Indexed: 08/24/2023] Open
Abstract
The trapezoid body (TB) contains axons of neurons residing in the anteroventral cochlear nucleus (AVCN) that provide excitatory and inhibitory inputs to the main monaural and binaural nuclei in the superior olivary complex (SOC). To understand the monaural and binaural response properties of neurons in the medial and lateral superior olive (MSO and LSO), it is important to characterize the temporal firing properties of these inputs. Because of its exceptional low-frequency hearing, the chinchilla (Chinchilla lanigera) is one of the widely used small animal models for studies of hearing. However, the characterization of the output of its ventral cochlear nucleus to the nuclei of the SOC is fragmentary. We obtained responses of TB axons to stimuli typically used in binaural studies and compared these responses to those of auditory nerve (AN) fibers, with a focus on temporal coding. We found enhancement of phase-locking and entrainment, i.e., the ability of a neuron to fire action potentials at a certain stimulus phase for nearly every stimulus period, in TB axons relative to AN fibers. Enhancement in phase-locking and entrainment are quantitatively more modest than in the cat but greater than in the gerbil. As in these species, these phenomena occur not only in low-frequency neurons stimulated at their characteristic frequency but also in neurons tuned to higher frequencies when stimulated with low-frequency tones, to which complex phase-locking behavior with multiple modes of firing per stimulus cycle is frequently observed.NEW & NOTEWORTHY The sensitivity of neurons to small time differences in sustained sounds to both ears is important for binaural hearing, and this sensitivity is critically dependent on phase-locking in the monaural pathways. Although studies in cat showed a marked improvement in phase-locking from the peripheral to the central auditory nervous system, the evidence in rodents is mixed. Here, we recorded from AN and TB of chinchilla and found temporal enhancement, though more limited than in cat.
Collapse
Affiliation(s)
- Liting Wei
- Laboratory of Auditory Neurophysiology, KU Leuven, Leuven, Belgium
| | - Eric Verschooten
- Laboratory of Auditory Neurophysiology, KU Leuven, Leuven, Belgium
| | - Philip X Joris
- Laboratory of Auditory Neurophysiology, KU Leuven, Leuven, Belgium
| |
Collapse
|
3
|
Joris PX. Use of reverse noise to measure ongoing delay. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:926-937. [PMID: 37578194 DOI: 10.1121/10.0020657] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 07/29/2023] [Indexed: 08/15/2023]
Abstract
Counts of spike coincidences provide a powerful means to compare responses to different stimuli or of different neurons, particularly regarding temporal factors. A drawback is that these methods do not provide an absolute measure of latency, i.e., the temporal interval between stimulus features and response. It is desirable to have such a measure within the analysis framework of coincidence counting. Single neuron responses were obtained, from 130 fibers in several tracts (auditory nerve, trapezoid body, lateral lemniscus), to a broadband noise and its polarity-inverted version. The spike trains in response to these stimuli are the "forward noise" responses. The same stimuli were also played time-reversed. The resulting spike trains were then again time-reversed: These are the "reverse-noise" responses. The forward and reverse responses were then analyzed with the coincidence count methods we have introduced earlier. Correlograms between forward- and reverse-noise responses show maxima at values consistent with latencies measured with other methods; the pattern of latencies with characteristic frequency, sound pressure level, and recording location was also consistent. At low characteristic frequencies, correlograms were well-predicted by reverse-correlation functions. We conclude that reverse noise provides an easy and reliable means to estimate latency of auditory nerve and brainstem neurons.
Collapse
Affiliation(s)
- Philip X Joris
- Laboratory of Auditory Neurophysiology, KU Leuven, Leuven B-3000, Belgium
| |
Collapse
|
4
|
Parida S, Heinz MG. Underlying neural mechanisms of degraded speech intelligibility following noise-induced hearing loss: The importance of distorted tonotopy. Hear Res 2022; 426:108586. [PMID: 35953357 PMCID: PMC11149709 DOI: 10.1016/j.heares.2022.108586] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 06/21/2022] [Accepted: 07/21/2022] [Indexed: 11/30/2022]
Abstract
Listeners with sensorineural hearing loss (SNHL) have substantial perceptual deficits, especially in noisy environments. Unfortunately, speech-intelligibility models have limited success in predicting the performance of listeners with hearing loss. A better understanding of the various suprathreshold factors that contribute to neural-coding degradations of speech in noisy conditions will facilitate better modeling and clinical outcomes. Here, we highlight the importance of one physiological factor that has received minimal attention to date, termed distorted tonotopy, which refers to a disruption in the mapping between acoustic frequency and cochlear place that is a hallmark of normal hearing. More so than commonly assumed factors (e.g., threshold elevation, reduced frequency selectivity, diminished temporal coding), distorted tonotopy severely degrades the neural representations of speech (particularly in noise) in single- and across-fiber responses in the auditory nerve following noise-induced hearing loss. Key results include: 1) effects of distorted tonotopy depend on stimulus spectral bandwidth and timbre, 2) distorted tonotopy increases across-fiber correlation and thus reduces information capacity to the brain, and 3) its effects vary across etiologies, which may contribute to individual differences. These results motivate the development and testing of noninvasive measures that can assess the severity of distorted tonotopy in human listeners. The development of such noninvasive measures of distorted tonotopy would advance precision-audiological approaches to improving diagnostics and rehabilitation for listeners with SNHL.
Collapse
Affiliation(s)
- Satyabrata Parida
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN, 47907 USA; Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, 15261 USA.
| | - Michael G Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN, 47907 USA; Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, 47907 USA
| |
Collapse
|
5
|
Differential weighting of temporal envelope cues from the low-frequency region for Mandarin sentence recognition in noise. BMC Neurosci 2022; 23:35. [PMID: 35698039 PMCID: PMC9190152 DOI: 10.1186/s12868-022-00721-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 06/01/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Temporal envelope cues are conveyed by cochlear implants (CIs) to hearing loss patients to restore hearing. Although CIs could enable users to communicate in clear listening environments, noisy environments still pose a problem. To improve speech-processing strategies used in Chinese CIs, we explored the relative contributions made by the temporal envelope in various frequency regions, as relevant to Mandarin sentence recognition in noise. METHODS Original speech material from the Mandarin version of the Hearing in Noise Test (MHINT) was mixed with speech-shaped noise (SSN), sinusoidally amplitude-modulated speech-shaped noise (SAM SSN), and sinusoidally amplitude-modulated (SAM) white noise (4 Hz) at a + 5 dB signal-to-noise ratio, respectively. Envelope information of the noise-corrupted speech material was extracted from 30 contiguous bands that were allocated to five frequency regions. The intelligibility of the noise-corrupted speech material (temporal cues from one or two regions were removed) was measured to estimate the relative weights of temporal envelope cues from the five frequency regions. RESULTS In SSN, the mean weights of Regions 1-5 were 0.34, 0.19, 0.20, 0.16, and 0.11, respectively; in SAM SSN, the mean weights of Regions 1-5 were 0.34, 0.17, 0.24, 0.14, and 0.11, respectively; and in SAM white noise, the mean weights of Regions 1-5 were 0.46, 0.24, 0.22, 0.06, and 0.02, respectively. CONCLUSIONS The results suggest that the temporal envelope in the low-frequency region transmits the greatest amount of information in terms of Mandarin sentence recognition for three types of noise, which differed from the perception strategy employed in clear listening environments.
Collapse
|
6
|
Kessler D, Carr CE, Kretzberg J, Ashida G. Theoretical Relationship Between Two Measures of Spike Synchrony: Correlation Index and Vector Strength. Front Neurosci 2022; 15:761826. [PMID: 34987357 PMCID: PMC8721039 DOI: 10.3389/fnins.2021.761826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 11/09/2021] [Indexed: 11/24/2022] Open
Abstract
Information processing in the nervous system critically relies on temporally precise spiking activity. In the auditory system, various degrees of phase-locking can be observed from the auditory nerve to cortical neurons. The classical metric for quantifying phase-locking is the vector strength (VS), which captures the periodicity in neuronal spiking. More recently, another metric, called the correlation index (CI), was proposed to quantify the temporally reproducible response characteristics of a neuron. The CI is defined as the peak value of a normalized shuffled autocorrelogram (SAC). Both VS and CI have been used to investigate how temporal information is processed and propagated along the auditory pathways. While previous analyses of physiological data in cats suggested covariation of these two metrics, general characterization of their connection has never been performed. In the present study, we derive a rigorous relationship between VS and CI. To model phase-locking, we assume Poissonian spike trains with a temporally changing intensity function following a von Mises distribution. We demonstrate that VS and CI are mutually related via the so-called concentration parameter that determines the degree of phase-locking. We confirm that these theoretical results are largely consistent with physiological data recorded in the auditory brainstem of various animals. In addition, we generate artificial phase-locked spike sequences, for which recording and analysis parameters can be systematically manipulated. Our analysis results suggest that mismatches between empirical data and the theoretical prediction can often be explained with deviations from the von Mises distribution, including skewed or multimodal period histograms. Furthermore, temporal relations of spike trains across trials can contribute to higher CI values than predicted mathematically based on the VS. We find that, for most applications, a SAC bin width of 50 ms seems to be a favorable choice, leading to an estimated error below 2.5% for physiologically plausible conditions. Overall, our results provide general relations between the two measures of phase-locking and will aid future analyses of different physiological datasets that are characterized with these metrics.
Collapse
Affiliation(s)
- Dominik Kessler
- Computational Neuroscience, Department of Neuroscience, Faculty VI, University of Oldenburg, Oldenburg, Germany
| | - Catherine E Carr
- Department of Biology, University of Maryland, College Park, MD, United States
| | - Jutta Kretzberg
- Computational Neuroscience, Department of Neuroscience, Faculty VI, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, Department of Neuroscience, Faculty VI, University of Oldenburg, Oldenburg, Germany
| | - Go Ashida
- Computational Neuroscience, Department of Neuroscience, Faculty VI, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, Department of Neuroscience, Faculty VI, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
7
|
Hernández-Pérez H, Mikiel-Hunter J, McAlpine D, Dhar S, Boothalingam S, Monaghan JJM, McMahon CM. Understanding degraded speech leads to perceptual gating of a brainstem reflex in human listeners. PLoS Biol 2021; 19:e3001439. [PMID: 34669696 PMCID: PMC8559948 DOI: 10.1371/journal.pbio.3001439] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 11/01/2021] [Accepted: 10/07/2021] [Indexed: 11/19/2022] Open
Abstract
The ability to navigate "cocktail party" situations by focusing on sounds of interest over irrelevant, background sounds is often considered in terms of cortical mechanisms. However, subcortical circuits such as the pathway underlying the medial olivocochlear (MOC) reflex modulate the activity of the inner ear itself, supporting the extraction of salient features from auditory scene prior to any cortical processing. To understand the contribution of auditory subcortical nuclei and the cochlea in complex listening tasks, we made physiological recordings along the auditory pathway while listeners engaged in detecting non(sense) words in lists of words. Both naturally spoken and intrinsically noisy, vocoded speech-filtering that mimics processing by a cochlear implant (CI)-significantly activated the MOC reflex, but this was not the case for speech in background noise, which more engaged midbrain and cortical resources. A model of the initial stages of auditory processing reproduced specific effects of each form of speech degradation, providing a rationale for goal-directed gating of the MOC reflex based on enhancing the representation of the energy envelope of the acoustic waveform. Our data reveal the coexistence of 2 strategies in the auditory system that may facilitate speech understanding in situations where the signal is either intrinsically degraded or masked by extrinsic acoustic energy. Whereas intrinsically degraded streams recruit the MOC reflex to improve representation of speech cues peripherally, extrinsically masked streams rely more on higher auditory centres to denoise signals.
Collapse
Affiliation(s)
- Heivet Hernández-Pérez
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - Jason Mikiel-Hunter
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - David McAlpine
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - Sumitrajit Dhar
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, Illinois, United States of America
| | - Sriram Boothalingam
- University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Jessica J. M. Monaghan
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
- National Acoustic Laboratories, Sydney, Australia
| | - Catherine M. McMahon
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| |
Collapse
|
8
|
Viswanathan V, Shinn-Cunningham BG, Heinz MG. Temporal fine structure influences voicing confusions for consonant identification in multi-talker babble. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2664. [PMID: 34717498 PMCID: PMC8514254 DOI: 10.1121/10.0006527] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 09/07/2021] [Accepted: 09/09/2021] [Indexed: 05/17/2023]
Abstract
To understand the mechanisms of speech perception in everyday listening environments, it is important to elucidate the relative contributions of different acoustic cues in transmitting phonetic content. Previous studies suggest that the envelope of speech in different frequency bands conveys most speech content, while the temporal fine structure (TFS) can aid in segregating target speech from background noise. However, the role of TFS in conveying phonetic content beyond what envelopes convey for intact speech in complex acoustic scenes is poorly understood. The present study addressed this question using online psychophysical experiments to measure the identification of consonants in multi-talker babble for intelligibility-matched intact and 64-channel envelope-vocoded stimuli. Consonant confusion patterns revealed that listeners had a greater tendency in the vocoded (versus intact) condition to be biased toward reporting that they heard an unvoiced consonant, despite envelope and place cues being largely preserved. This result was replicated when babble instances were varied across independent experiments, suggesting that TFS conveys voicing information beyond what is conveyed by envelopes for intact speech in babble. Given that multi-talker babble is a masker that is ubiquitous in everyday environments, this finding has implications for the design of assistive listening devices such as cochlear implants.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA
| | | | - Michael G. Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
9
|
Wang C, Wang Z, Xie B, Shi X, Yang P, Liu L, Qu T, Qin Q, Xing Y, Zhu W, Teipel SJ, Jia J, Zhao G, Li L, Tang Y. Binaural processing deficit and cognitive impairment in Alzheimer's disease. Alzheimers Dement 2021; 18:1085-1099. [PMID: 34569690 DOI: 10.1002/alz.12464] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 07/07/2021] [Accepted: 08/05/2021] [Indexed: 01/08/2023]
Abstract
Speech comprehension in noisy environments depends on central auditory functions, which are vulnerable in Alzheimer's disease (AD). Binaural processing exploits two ear sounds to optimally process degraded sound information; its characteristics are poorly understood in AD. We studied behavioral and electrophysiological alterations in binaural processing among 121 participants (AD = 27; amnestic mild cognitive impairment [aMCI] = 33; subjective cognitive decline [SCD] = 30; cognitively normal [CN] = 31). We observed impairment of binaural processing in AD and aMCI, and detected a U-shaped curve change in phase synchrony (declining from CN to SCD and to aMCI, but increasing from aMCI to AD). This improvement in phase synchrony accompanying more severe cognitive stages could reflect neural adaptation for binaural processing. Moreover, increased phase synchrony is associated with worse memory during the stages when neural adaptation apparently occurs. These findings support a hypothesis that neural adaptation for binaural processing deficit may exacerbate cognitive impairment, which could help identify biomarkers and therapeutic targets in AD.
Collapse
Affiliation(s)
- Changming Wang
- Department of Neurosurgery, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China
| | - Zhibin Wang
- Innovation Center for Neurological Disorders, Department of Neurology, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China
| | - Beijia Xie
- Innovation Center for Neurological Disorders, Department of Neurology, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China
| | - Xinrui Shi
- Innovation Center for Neurological Disorders, Department of Neurology, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China
| | - Pengcheng Yang
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,Speech and Hearing Research Center, Peking University, Beijing, China
| | - Lei Liu
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,Speech and Hearing Research Center, Peking University, Beijing, China
| | - Tianshu Qu
- Speech and Hearing Research Center, Peking University, Beijing, China.,Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China
| | - Qi Qin
- Innovation Center for Neurological Disorders, Department of Neurology, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China
| | - Yi Xing
- Innovation Center for Neurological Disorders, Department of Neurology, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China.,Key Laboratory of Neurodegenerative Diseases, Ministry of Education of the People's Republic of China, Beijing, China
| | - Wei Zhu
- Innovation Center for Neurological Disorders, Department of Neurology, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China
| | - Stefan J Teipel
- Department of Psychosomatic Medicine, University Medicine Rostock, Rostock, Germany.,DZNE, German Center for Neurodegenerative Diseases, Rostock, Germany
| | - Jianping Jia
- Innovation Center for Neurological Disorders, Department of Neurology, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China.,Key Laboratory of Neurodegenerative Diseases, Ministry of Education of the People's Republic of China, Beijing, China.,Center of Alzheimer's Disease, Beijing Institute for Brain Disorders, Beijing, China.,Beijing Key Laboratory of Geriatric Cognitive Disorders, Beijing, China.,National Clinical Research Center for Geriatric Disorders, Beijing, China
| | - Guoguang Zhao
- Department of Neurosurgery, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China
| | - Liang Li
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,Speech and Hearing Research Center, Peking University, Beijing, China.,Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China.,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China.,Beijing Institute for Brain Disorders, Beijing, China
| | - Yi Tang
- Innovation Center for Neurological Disorders, Department of Neurology, Xuanwu Hospital, Capital Medical University, National Center for Neurological Disorders, Beijing, China.,Key Laboratory of Neurodegenerative Diseases, Ministry of Education of the People's Republic of China, Beijing, China
| |
Collapse
|
10
|
Varnet L, Léger AC, Boucher S, Bonnet C, Petit C, Lorenzi C. Contributions of Age-Related and Audibility-Related Deficits to Aided Consonant Identification in Presbycusis: A Causal-Inference Analysis. Front Aging Neurosci 2021; 13:640522. [PMID: 33732140 PMCID: PMC7956988 DOI: 10.3389/fnagi.2021.640522] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 02/08/2021] [Indexed: 12/05/2022] Open
Abstract
The decline of speech intelligibility in presbycusis can be regarded as resulting from the combined contribution of two main groups of factors: (1) audibility-related factors and (2) age-related factors. In particular, there is now an abundant scientific literature on the crucial role of suprathreshold auditory abilities and cognitive functions, which have been found to decline with age even in the absence of audiometric hearing loss. However, researchers investigating the direct effect of aging in presbycusis have to deal with the methodological issue that age and peripheral hearing loss covary to a large extent. In the present study, we analyzed a dataset of consonant-identification scores measured in quiet and in noise for a large cohort (n = 459, age = 42-92) of hearing-impaired (HI) and normal-hearing (NH) listeners. HI listeners were provided with a frequency-dependent amplification adjusted to their audiometric profile. Their scores in the two conditions were predicted from their pure-tone average (PTA) and age, as well as from their Extended Speech Intelligibility Index (ESII), a measure of the impact of audibility loss on speech intelligibility. We relied on a causal-inference approach combined with Bayesian modeling to disentangle the direct causal effects of age and audibility on intelligibility from the indirect effect of age on hearing loss. The analysis revealed that the direct effect of PTA on HI intelligibility scores was 5 times higher than the effect of age. This overwhelming effect of PTA was not due to a residual audibility loss despite amplification, as confirmed by a ESII-based model. More plausibly, the marginal role of age could be a consequence of the relatively little cognitively-demanding task used in this study. Furthermore, the amount of variance in intelligibility scores was smaller for NH than HI listeners, even after accounting for age and audibility, reflecting the presence of additional suprathreshold deficits in the latter group. Although the non-sense-syllable materials and the particular amplification settings used in this study potentially restrict the generalization of the findings, we think that these promising results call for a wider use of causal-inference analysis in audiology, e.g., as a way to disentangle the influence of the various cognitive factors and suprathreshold deficits associated to presbycusis.
Collapse
Affiliation(s)
- Léo Varnet
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d'Études Cognitives, École normale supérieure, Université Paris Sciences & Lettres, Paris, France
| | - Agnès C. Léger
- Manchester Centre for Audiology and Deafness, Division of Human Communication, Development & Hearing, School of Health Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, United Kingdom
| | - Sophie Boucher
- Complexité du Vivant, Sorbonne Universités, Université Pierre et Marie Curie, Université Paris VI, Paris, France
- Institut de l'Audition, Institut Pasteur, INSERM, Paris, France
- Centre Hospitalier Universitaire d'Angers, Angers, France
| | - Crystel Bonnet
- Complexité du Vivant, Sorbonne Universités, Université Pierre et Marie Curie, Université Paris VI, Paris, France
- Institut de l'Audition, Institut Pasteur, INSERM, Paris, France
| | - Christine Petit
- Institut de l'Audition, Institut Pasteur, INSERM, Paris, France
- Collège de France, Paris, France
| | - Christian Lorenzi
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d'Études Cognitives, École normale supérieure, Université Paris Sciences & Lettres, Paris, France
| |
Collapse
|
11
|
Parida S, Bharadwaj H, Heinz MG. Spectrally specific temporal analyses of spike-train responses to complex sounds: A unifying framework. PLoS Comput Biol 2021; 17:e1008155. [PMID: 33617548 PMCID: PMC7932515 DOI: 10.1371/journal.pcbi.1008155] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 03/04/2021] [Accepted: 02/04/2021] [Indexed: 11/24/2022] Open
Abstract
Significant scientific and translational questions remain in auditory neuroscience surrounding the neural correlates of perception. Relating perceptual and neural data collected from humans can be useful; however, human-based neural data are typically limited to evoked far-field responses, which lack anatomical and physiological specificity. Laboratory-controlled preclinical animal models offer the advantage of comparing single-unit and evoked responses from the same animals. This ability provides opportunities to develop invaluable insight into proper interpretations of evoked responses, which benefits both basic-science studies of neural mechanisms and translational applications, e.g., diagnostic development. However, these comparisons have been limited by a disconnect between the types of spectrotemporal analyses used with single-unit spike trains and evoked responses, which results because these response types are fundamentally different (point-process versus continuous-valued signals) even though the responses themselves are related. Here, we describe a unifying framework to study temporal coding of complex sounds that allows spike-train and evoked-response data to be analyzed and compared using the same advanced signal-processing techniques. The framework uses a set of peristimulus-time histograms computed from single-unit spike trains in response to polarity-alternating stimuli to allow advanced spectral analyses of both slow (envelope) and rapid (temporal fine structure) response components. Demonstrated benefits include: (1) novel spectrally specific temporal-coding measures that are less confounded by distortions due to hair-cell transduction, synaptic rectification, and neural stochasticity compared to previous metrics, e.g., the correlogram peak-height, (2) spectrally specific analyses of spike-train modulation coding (magnitude and phase), which can be directly compared to modern perceptually based models of speech intelligibility (e.g., that depend on modulation filter banks), and (3) superior spectral resolution in analyzing the neural representation of nonstationary sounds, such as speech and music. This unifying framework significantly expands the potential of preclinical animal models to advance our understanding of the physiological correlates of perceptual deficits in real-world listening following sensorineural hearing loss.
Collapse
Affiliation(s)
- Satyabrata Parida
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana, United States of America
| | - Hari Bharadwaj
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana, United States of America
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Michael G. Heinz
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana, United States of America
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana, United States of America
| |
Collapse
|
12
|
Simulations with FADE of the effect of impaired hearing on speech recognition performance cast doubt on the role of spectral resolution. Hear Res 2020; 395:107995. [DOI: 10.1016/j.heares.2020.107995] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 04/06/2020] [Accepted: 05/12/2020] [Indexed: 11/18/2022]
|
13
|
Moore BCJ. Effects of hearing loss and age on the binaural processing of temporal envelope and temporal fine structure information. Hear Res 2020; 402:107991. [PMID: 32418682 DOI: 10.1016/j.heares.2020.107991] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 04/24/2020] [Accepted: 05/05/2020] [Indexed: 11/28/2022]
Abstract
Within the cochlea, broadband sounds like speech and music are filtered into a series of narrowband signals, each with a relatively slowly varying envelope (ENV) imposed on a rapidly oscillating carrier (the temporal fine structure, TFS). Information about ENV is conveyed by the timing and short-term rate of action potentials in the auditory nerve while information about TFS is conveyed by synchronization of action potentials to a specific phase of the waveform in the cochlea (phase locking). This paper describes the effects of age and hearing loss on the binaural processing of ENV and TFS information, i.e. on the processing of differences in ENV and TFS at the two ears. The binaural processing of TFS information is adversely affected by both hearing loss and increasing age. The binaural processing of ENV information deteriorates somewhat with increasing age but is only slightly affected by hearing loss. The reduced TFS processing abilities found for older/hearing-impaired subjects may partially account for the difficulties that such subjects experience in complex listening situations when the target speech and interfering sounds come from different directions in space.
Collapse
Affiliation(s)
- Brian C J Moore
- Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge, CB2 3EB, UK.
| |
Collapse
|
14
|
Baltzell LS, Swaminathan J, Cho AY, Lavandier M, Best V. Binaural sensitivity and release from speech-on-speech masking in listeners with and without hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1546. [PMID: 32237845 PMCID: PMC7060089 DOI: 10.1121/10.0000812] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 02/07/2020] [Accepted: 02/11/2020] [Indexed: 05/29/2023]
Abstract
Listeners with sensorineural hearing loss routinely experience less spatial release from masking (SRM) in speech mixtures than listeners with normal hearing. Hearing-impaired listeners have also been shown to have degraded temporal fine structure (TFS) sensitivity, a consequence of which is degraded access to interaural time differences (ITDs) contained in the TFS. Since these "binaural TFS" cues are critical for spatial hearing, it has been hypothesized that degraded binaural TFS sensitivity accounts for the limited SRM experienced by hearing-impaired listeners. In this study, speech stimuli were noise-vocoded using carriers that were systematically decorrelated across the left and right ears, thus simulating degraded binaural TFS sensitivity. Both (1) ITD sensitivity in quiet and (2) SRM in speech mixtures spatialized using ITDs (or binaural release from masking; BRM) were measured as a function of TFS interaural decorrelation in young normal-hearing and hearing-impaired listeners. This allowed for the examination of the relationship between ITD sensitivity and BRM over a wide range of ITD thresholds. This paper found that, for a given ITD sensitivity, hearing-impaired listeners experienced less BRM than normal-hearing listeners, suggesting that binaural TFS sensitivity can account for only a modest portion of the BRM deficit in hearing-impaired listeners. However, substantial individual variability was observed.
Collapse
Affiliation(s)
- Lucas S Baltzell
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jayaganesh Swaminathan
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Adrian Y Cho
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Mathieu Lavandier
- University of Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, F-69518 Vaulx-en-Velin Cedex, France
| | - Virginia Best
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
15
|
Riggs WJ, Hiss MM, Skidmore J, Varadarajan VV, Mattingly JK, Moberly AC, Adunka OF. Utilizing Electrocochleography as a Microphone for Fully Implantable Cochlear Implants. Sci Rep 2020; 10:3714. [PMID: 32111954 PMCID: PMC7048783 DOI: 10.1038/s41598-020-60694-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/13/2020] [Indexed: 11/09/2022] Open
Abstract
Current cochlear implants (CIs) are semi-implantable devices with an externally worn sound processor that hosts the microphone and sound processor. A fully implantable device, however, would ultimately be desirable as it would be of great benefit to recipients. While some prototypes have been designed and used in a few select cases, one main stumbling block is the sound input. Specifically, subdermal implantable microphone technology has been poised with physiologic issues such as sound distortion and signal attenuation under the skin. Here we propose an alternative method that utilizes a physiologic response composed of an electrical field generated by the sensory cells of the inner ear to serve as a sound source microphone for fully implantable hearing technology such as CIs. Electrophysiological results obtained from 14 participants (adult and pediatric) document the feasibility of capturing speech properties within the electrocochleography (ECochG) response. Degradation of formant properties of the stimuli /da/ and /ba/ are evaluated across various degrees of hearing loss. Preliminary results suggest proof-of-concept of using the ECochG response as a microphone is feasible to capture vital properties of speech. However, further signal processing refinement is needed in addition to utilization of an intracochlear recording location to likely improve signal fidelity.
Collapse
Affiliation(s)
- William Jason Riggs
- Department of Otolaryngology, Head & Neck Surgery, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Meghan M Hiss
- Department of Otolaryngology, Head & Neck Surgery, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Jeffrey Skidmore
- Department of Otolaryngology, Head & Neck Surgery, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Varun V Varadarajan
- Department of Otolaryngology, Head & Neck Surgery, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Jameson K Mattingly
- Department of Otolaryngology, Head & Neck Surgery, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Aaron C Moberly
- Department of Otolaryngology, Head & Neck Surgery, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Oliver F Adunka
- Department of Otolaryngology, Head & Neck Surgery, The Ohio State University College of Medicine, Columbus, OH, USA.
| |
Collapse
|
16
|
Heeringa AN, Zhang L, Ashida G, Beutelmann R, Steenken F, Köppl C. Temporal Coding of Single Auditory Nerve Fibers Is Not Degraded in Aging Gerbils. J Neurosci 2020. [PMID: 31719164 DOI: 10.1101/2020.02.10.942011] [Citation(s) in RCA: 98] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2023] Open
Abstract
People suffering from age-related hearing loss typically present with deficits in temporal processing tasks. Temporal processing deficits have also been shown in single-unit studies at the level of the auditory brainstem, midbrain, and cortex of aged animals. In this study, we explored whether temporal coding is already affected at the level of the input to the central auditory system. Single-unit auditory nerve fiber recordings were obtained from 41 Mongolian gerbils of either sex, divided between young, middle-aged, and old gerbils. Temporal coding quality was evaluated as vector strength in response to tones at best frequency, and by constructing shuffled and cross-stimulus autocorrelograms, and reverse correlations, from responses to 1 s noise bursts at 10-30 dB sensation level (dB above threshold). At comparable sensation levels, all measures showed that temporal coding was not altered in auditory nerve fibers of aging gerbils. Furthermore, both temporal fine structure and envelope coding remained unaffected. However, spontaneous rates were decreased in aging gerbils. Importantly, despite elevated pure tone thresholds, the frequency tuning of auditory nerve fibers was not affected. These results suggest that age-related temporal coding deficits arise more centrally, possibly due to a loss of auditory nerve fibers (or their peripheral synapses) but not due to qualitative changes in the responses of remaining auditory nerve fibers. The reduced spontaneous rate and elevated thresholds, but normal frequency tuning, of aged auditory nerve fibers can be explained by the well known reduction of endocochlear potential due to strial dysfunction in aged gerbils.SIGNIFICANCE STATEMENT As our society ages, age-related hearing deficits become ever more prevalent. Apart from decreased hearing sensitivity, elderly people often suffer from a reduced ability to communicate in daily settings, which is thought to be caused by known age-related deficits in auditory temporal processing. The current study demonstrated, using several different stimuli and analysis techniques, that these putative temporal processing deficits are not apparent in responses of single-unit auditory nerve fibers of quiet-aged gerbils. This suggests that age-related temporal processing deficits may develop more central to the auditory nerve, possibly due to a reduced population of active auditory nerve fibers, which will be of importance for the development of treatments for age-related hearing disorders.
Collapse
Affiliation(s)
- Amarins N Heeringa
- Cluster of Excellence "Hearing4all" and Research Centre Neurosensory Science, Department of Neuroscience, School of Medicine and Health Science, Carl von Ossietzky University Oldenburg, 26129 Oldenburg, Germany
| | - Lichun Zhang
- Cluster of Excellence "Hearing4all" and Research Centre Neurosensory Science, Department of Neuroscience, School of Medicine and Health Science, Carl von Ossietzky University Oldenburg, 26129 Oldenburg, Germany
| | - Go Ashida
- Cluster of Excellence "Hearing4all" and Research Centre Neurosensory Science, Department of Neuroscience, School of Medicine and Health Science, Carl von Ossietzky University Oldenburg, 26129 Oldenburg, Germany
| | - Rainer Beutelmann
- Cluster of Excellence "Hearing4all" and Research Centre Neurosensory Science, Department of Neuroscience, School of Medicine and Health Science, Carl von Ossietzky University Oldenburg, 26129 Oldenburg, Germany
| | - Friederike Steenken
- Cluster of Excellence "Hearing4all" and Research Centre Neurosensory Science, Department of Neuroscience, School of Medicine and Health Science, Carl von Ossietzky University Oldenburg, 26129 Oldenburg, Germany
| | - Christine Köppl
- Cluster of Excellence "Hearing4all" and Research Centre Neurosensory Science, Department of Neuroscience, School of Medicine and Health Science, Carl von Ossietzky University Oldenburg, 26129 Oldenburg, Germany
| |
Collapse
|
17
|
Hu G, Determan SC, Dong Y, Beeve AT, Collins JE, Gai Y. Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise. J Assoc Res Otolaryngol 2019; 21:73-87. [PMID: 31758279 DOI: 10.1007/s10162-019-00737-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 09/16/2019] [Indexed: 11/30/2022] Open
Abstract
Acoustic features of speech include various spectral and temporal cues. It is known that temporal envelope plays a critical role for speech recognition by human listeners, while automated speech recognition (ASR) heavily relies on spectral analysis. This study compared sentence-recognition scores of humans and an ASR software, Dragon, when spectral and temporal-envelope cues were manipulated in background noise. Temporal fine structure of meaningful sentences was reduced by noise or tone vocoders. Three types of background noise were introduced: a white noise, a time-reversed multi-talker noise, and a fake-formant noise. Spectral information was manipulated by changing the number of frequency channels. With a 20-dB signal-to-noise ratio (SNR) and four vocoding channels, white noise had a stronger disruptive effect than the fake-formant noise. The same observation with 22 channels was made when SNR was lowered to 0 dB. In contrast, ASR was unable to function with four vocoding channels even with a 20-dB SNR. Its performance was least affected by white noise and most affected by the fake-formant noise. Increasing the number of channels, which improved the spectral resolution, generated non-monotonic behaviors for the ASR with white noise but not with colored noise. The ASR also showed highly improved performance with tone vocoders. It is possible that fake-formant noise affected the software's performance by disrupting spectral cues, whereas white noise affected performance by compromising speech segmentation. Overall, these results suggest that human listeners and ASR utilize different listening strategies in noise.
Collapse
Affiliation(s)
- Guangxin Hu
- Biomedical Engineering Department, Saint Louis University, 3007 Lindell Blvd Suite 2007, St Louis, MO, 63103, USA
| | - Sarah C Determan
- Biomedical Engineering Department, Saint Louis University, 3007 Lindell Blvd Suite 2007, St Louis, MO, 63103, USA
| | - Yue Dong
- Biomedical Engineering Department, Saint Louis University, 3007 Lindell Blvd Suite 2007, St Louis, MO, 63103, USA
| | - Alec T Beeve
- Biomedical Engineering Department, Saint Louis University, 3007 Lindell Blvd Suite 2007, St Louis, MO, 63103, USA
| | - Joshua E Collins
- Biomedical Engineering Department, Saint Louis University, 3007 Lindell Blvd Suite 2007, St Louis, MO, 63103, USA
| | - Yan Gai
- Biomedical Engineering Department, Saint Louis University, 3007 Lindell Blvd Suite 2007, St Louis, MO, 63103, USA.
| |
Collapse
|
18
|
Temporal Coding of Single Auditory Nerve Fibers Is Not Degraded in Aging Gerbils. J Neurosci 2019; 40:343-354. [PMID: 31719164 DOI: 10.1523/jneurosci.2784-18.2019] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 10/25/2019] [Accepted: 11/04/2019] [Indexed: 02/03/2023] Open
Abstract
People suffering from age-related hearing loss typically present with deficits in temporal processing tasks. Temporal processing deficits have also been shown in single-unit studies at the level of the auditory brainstem, midbrain, and cortex of aged animals. In this study, we explored whether temporal coding is already affected at the level of the input to the central auditory system. Single-unit auditory nerve fiber recordings were obtained from 41 Mongolian gerbils of either sex, divided between young, middle-aged, and old gerbils. Temporal coding quality was evaluated as vector strength in response to tones at best frequency, and by constructing shuffled and cross-stimulus autocorrelograms, and reverse correlations, from responses to 1 s noise bursts at 10-30 dB sensation level (dB above threshold). At comparable sensation levels, all measures showed that temporal coding was not altered in auditory nerve fibers of aging gerbils. Furthermore, both temporal fine structure and envelope coding remained unaffected. However, spontaneous rates were decreased in aging gerbils. Importantly, despite elevated pure tone thresholds, the frequency tuning of auditory nerve fibers was not affected. These results suggest that age-related temporal coding deficits arise more centrally, possibly due to a loss of auditory nerve fibers (or their peripheral synapses) but not due to qualitative changes in the responses of remaining auditory nerve fibers. The reduced spontaneous rate and elevated thresholds, but normal frequency tuning, of aged auditory nerve fibers can be explained by the well known reduction of endocochlear potential due to strial dysfunction in aged gerbils.SIGNIFICANCE STATEMENT As our society ages, age-related hearing deficits become ever more prevalent. Apart from decreased hearing sensitivity, elderly people often suffer from a reduced ability to communicate in daily settings, which is thought to be caused by known age-related deficits in auditory temporal processing. The current study demonstrated, using several different stimuli and analysis techniques, that these putative temporal processing deficits are not apparent in responses of single-unit auditory nerve fibers of quiet-aged gerbils. This suggests that age-related temporal processing deficits may develop more central to the auditory nerve, possibly due to a reduced population of active auditory nerve fibers, which will be of importance for the development of treatments for age-related hearing disorders.
Collapse
|
19
|
Trevino M, Lobarinas E, Maulden AC, Heinz MG. The chinchilla animal model for hearing science and noise-induced hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3710. [PMID: 31795699 PMCID: PMC6881193 DOI: 10.1121/1.5132950] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 09/19/2019] [Accepted: 09/24/2019] [Indexed: 05/07/2023]
Abstract
The chinchilla animal model for noise-induced hearing loss has an extensive history spanning more than 50 years. Many behavioral, anatomical, and physiological characteristics of the chinchilla make it a valuable animal model for hearing science. These include similarities with human hearing frequency and intensity sensitivity, the ability to be trained behaviorally with acoustic stimuli relevant to human hearing, a docile nature that allows many physiological measures to be made in an awake state, physiological robustness that allows for data to be collected from all levels of the auditory system, and the ability to model various types of conductive and sensorineural hearing losses that mimic pathologies observed in humans. Given these attributes, chinchillas have been used repeatedly to study anatomical, physiological, and behavioral effects of continuous and impulse noise exposures that produce either temporary or permanent threshold shifts. Based on the mechanistic insights from noise-exposure studies, chinchillas have also been used in pre-clinical drug studies for the prevention and rescue of noise-induced hearing loss. This review paper highlights the role of the chinchilla model in hearing science, its important contributions, and its advantages and limitations.
Collapse
Affiliation(s)
- Monica Trevino
- School of Behavioral and Brain Sciences, Callier Center, The University of Texas at Dallas, 1966 Inwood Road, Dallas, Texas 75235, USA
| | - Edward Lobarinas
- School of Behavioral and Brain Sciences, Callier Center, The University of Texas at Dallas, 1966 Inwood Road, Dallas, Texas 75235, USA
| | - Amanda C Maulden
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, Indiana 47907, USA
| | - Michael G Heinz
- Weldon School of Biomedical Engineering, Purdue University, 715 Clinic Drive, West Lafayette, Indiana 47907, USA
| |
Collapse
|
20
|
Moncada-Torres A, Joshi SN, Prokopiou A, Wouters J, Epp B, Francart T. A framework for computational modelling of interaural time difference discrimination of normal and hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:940. [PMID: 30180705 DOI: 10.1121/1.5051322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 08/03/2018] [Indexed: 06/08/2023]
Abstract
Different computational models have been developed to study the interaural time difference (ITD) perception. However, only few have used a physiologically inspired architecture to study ITD discrimination. Furthermore, they do not include aspects of hearing impairment. In this work, a framework was developed to predict ITD thresholds in listeners with normal and impaired hearing. It combines the physiologically inspired model of the auditory periphery proposed by Zilany, Bruce, Nelson, and Carney [(2009). J. Acoust. Soc. Am. 126(5), 2390-2412] as a front end with a coincidence detection stage and a neurometric decision device as a back end. It was validated by comparing its predictions against behavioral data for narrowband stimuli from literature. The framework is able to model ITD discrimination of normal-hearing and hearing-impaired listeners at a group level. Additionally, it was used to explore the effect of different proportions of outer- and inner-hair cell impairment on ITD discrimination.
Collapse
Affiliation(s)
- Arturo Moncada-Torres
- KU Leuven - University of Leuven, Department of Neurosciences, ExpORL, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Suyash N Joshi
- Department of Electrical Engineering, Hearing Systems, Technical University of Denmark, Ørsteds Plads, Building 352, DK-2800 Kongens Lyngby, Denmark
| | - Andreas Prokopiou
- KU Leuven - University of Leuven, Department of Neurosciences, ExpORL, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Jan Wouters
- KU Leuven - University of Leuven, Department of Neurosciences, ExpORL, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Bastian Epp
- Department of Electrical Engineering, Hearing Systems, Technical University of Denmark, Ørsteds Plads, Building 352, DK-2800 Kongens Lyngby, Denmark
| | - Tom Francart
- KU Leuven - University of Leuven, Department of Neurosciences, ExpORL, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| |
Collapse
|
21
|
Paraouty N, Stasiak A, Lorenzi C, Varnet L, Winter IM. Dual Coding of Frequency Modulation in the Ventral Cochlear Nucleus. J Neurosci 2018; 38:4123-4137. [PMID: 29599389 PMCID: PMC6596033 DOI: 10.1523/jneurosci.2107-17.2018] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 03/18/2018] [Accepted: 03/22/2018] [Indexed: 11/21/2022] Open
Abstract
Frequency modulation (FM) is a common acoustic feature of natural sounds and is known to play a role in robust sound source recognition. Auditory neurons show precise stimulus-synchronized discharge patterns that may be used for the representation of low-rate FM. However, it remains unclear whether this representation is based on synchronization to slow temporal envelope (ENV) cues resulting from cochlear filtering or phase locking to faster temporal fine structure (TFS) cues. To investigate the plausibility of those encoding schemes, single units of the ventral cochlear nucleus of guinea pigs of either sex were recorded in response to sine FM tones centered at the unit's best frequency (BF). The results show that, in contrast to high-BF units, for modulation depths within the receptive field, low-BF units (<4 kHz) demonstrate good phase locking to TFS. For modulation depths extending beyond the receptive field, the discharge patterns follow the ENV and fluctuate at the modulation rate. The receptive field proved to be a good predictor of the ENV responses for most primary-like and chopper units. The current in vivo data also reveal a high level of diversity in responses across unit types. TFS cues are mainly conveyed by low-frequency and primary-like units and ENV cues by chopper and onset units. The diversity of responses exhibited by cochlear nucleus neurons provides a neural basis for a dual-coding scheme of FM in the brainstem based on both ENV and TFS cues.SIGNIFICANCE STATEMENT Natural sounds, including speech, convey informative temporal modulations in frequency. Understanding how the auditory system represents those frequency modulations (FM) has important implications as robust sound source recognition depends crucially on the reception of low-rate FM cues. Here, we recorded 115 single-unit responses from the ventral cochlear nucleus in response to FM and provide the first physiological evidence of a dual-coding mechanism of FM via synchronization to temporal envelope cues and phase locking to temporal fine structure cues. We also demonstrate a diversity of neural responses with different coding specializations. These results support the dual-coding scheme proposed by psychophysicists to account for FM sensitivity in humans and provide new insights on how this might be implemented in the early stages of the auditory pathway.
Collapse
Affiliation(s)
- Nihaad Paraouty
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge, United Kingdom and
- Laboratoire des Systèmes Perceptifs CNRS UMR 8248, École Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France
| | - Arkadiusz Stasiak
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge, United Kingdom and
| | - Christian Lorenzi
- Laboratoire des Systèmes Perceptifs CNRS UMR 8248, École Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France
| | - Léo Varnet
- Laboratoire des Systèmes Perceptifs CNRS UMR 8248, École Normale Supérieure, Paris Sciences et Lettres Research University, Paris, France
| | - Ian M Winter
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge, United Kingdom and
| |
Collapse
|
22
|
Moore BCJ. Effects of age on sensitivity to interaural time differences in envelope and fine structure, individually and in combination. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1287. [PMID: 29604696 PMCID: PMC5834318 DOI: 10.1121/1.5025845] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 02/08/2018] [Accepted: 02/10/2018] [Indexed: 06/01/2023]
Abstract
Sensitivity to interaural time differences (ITDs) in envelope and temporal fine structure (TFS) of amplitude-modulated (AM) tones was assessed for young and older subjects, all with clinically normal hearing at the carrier frequencies of 250 and 500 Hz. Some subjects had hearing loss at higher frequencies. In experiment 1, thresholds for detecting changes in ITD were measured when the ITD was present in the TFS alone (ITDTFS), the envelope alone (ITDENV), or both (ITDTFS/ENV). Thresholds tended to be higher for the older than for the young subjects. ITDENV thresholds were much higher than ITDTFS thresholds, while ITDTFS/ENV thresholds were similar to ITDTFS thresholds. ITDTFS thresholds were lower than ITD thresholds obtained with an unmodulated pure tone, indicating that uninformative AM can improve ITDTFS discrimination. In experiment 2, equally detectable values of ITDTFS and ITDENV were combined so as to give consistent or inconsistent lateralization. There were large individual differences, but several subjects gave scores that were much higher than would be expected from the optimal combination of independent sources of information, even for the inconsistent condition. It is suggested that ITDTFS and ITDENV cues are processed partly independently, but that both cues influence lateralization judgments, even when one cue is uninformative.
Collapse
|
23
|
Bones O, Wong PCM. Congenital amusics use a secondary pitch mechanism to identify lexical tones. Neuropsychologia 2017; 104:48-53. [PMID: 28782544 DOI: 10.1016/j.neuropsychologia.2017.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 06/22/2017] [Accepted: 08/03/2017] [Indexed: 11/16/2022]
Abstract
Amusia is a pitch perception disorder associated with deficits in processing and production of both musical and lexical tones, which previous reports have suggested may be constrained to fine-grained pitch judgements. In the present study speakers of tone-languages, in which lexical tones are used to convey meaning, identified words present in chimera stimuli containing conflicting pitch-cues in the temporal fine-structure and temporal envelope, and which therefore conveyed two distinct utterances. Amusics were found to be more likely than controls to judge the word according to the envelope pitch-cues. This demonstrates that amusia is not associated with fine-grained pitch judgements alone, and is consistent with there being two distinct pitch mechanisms and with amusics having an atypical reliance on a secondary mechanism based upon envelope cues.
Collapse
Affiliation(s)
- Oliver Bones
- Acoustics Research Centre, University of Salford, Salford M5 4WT, UK.
| | - Patrick C M Wong
- Department of Linguistics & Modern Languages and Brain and Mind Institute, The Chinese University of Hong Kong, Hong Kong SAR.
| |
Collapse
|
24
|
Predictions of Speech Chimaera Intelligibility Using Auditory Nerve Mean-Rate and Spike-Timing Neural Cues. J Assoc Res Otolaryngol 2017; 18:687-710. [PMID: 28748487 DOI: 10.1007/s10162-017-0627-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 05/29/2017] [Indexed: 10/19/2022] Open
Abstract
Perceptual studies of speech intelligibility have shown that slow variations of acoustic envelope (ENV) in a small set of frequency bands provides adequate information for good perceptual performance in quiet, whereas acoustic temporal fine-structure (TFS) cues play a supporting role in background noise. However, the implications for neural coding are prone to misinterpretation because the mean-rate neural representation can contain recovered ENV cues from cochlear filtering of TFS. We investigated ENV recovery and spike-time TFS coding using objective measures of simulated mean-rate and spike-timing neural representations of chimaeric speech, in which either the ENV or the TFS is replaced by another signal. We (a) evaluated the levels of mean-rate and spike-timing neural information for two categories of chimaeric speech, one retaining ENV cues and the other TFS; (b) examined the level of recovered ENV from cochlear filtering of TFS speech; (c) examined and quantified the contribution to recovered ENV from spike-timing cues using a lateral inhibition network (LIN); and (d) constructed linear regression models with objective measures of mean-rate and spike-timing neural cues and subjective phoneme perception scores from normal-hearing listeners. The mean-rate neural cues from the original ENV and recovered ENV partially accounted for perceptual score variability, with additional variability explained by the recovered ENV from the LIN-processed TFS speech. The best model predictions of chimaeric speech intelligibility were found when both the mean-rate and spike-timing neural cues were included, providing further evidence that spike-time coding of TFS cues is important for intelligibility when the speech envelope is degraded.
Collapse
|
25
|
Role of Binaural Temporal Fine Structure and Envelope Cues in Cocktail-Party Listening. J Neurosci 2017; 36:8250-7. [PMID: 27488643 DOI: 10.1523/jneurosci.4421-15.2016] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 06/19/2016] [Indexed: 11/21/2022] Open
Abstract
UNLABELLED While conversing in a crowded social setting, a listener is often required to follow a target speech signal amid multiple competing speech signals (the so-called "cocktail party" problem). In such situations, separation of the target speech signal in azimuth from the interfering masker signals can lead to an improvement in target intelligibility, an effect known as spatial release from masking (SRM). This study assessed the contributions of two stimulus properties that vary with separation of sound sources, binaural envelope (ENV) and temporal fine structure (TFS), to SRM in normal-hearing (NH) human listeners. Target speech was presented from the front and speech maskers were either colocated with or symmetrically separated from the target in azimuth. The target and maskers were presented either as natural speech or as "noise-vocoded" speech in which the intelligibility was conveyed only by the speech ENVs from several frequency bands; the speech TFS within each band was replaced with noise carriers. The experiments were designed to preserve the spatial cues in the speech ENVs while retaining/eliminating them from the TFS. This was achieved by using the same/different noise carriers in the two ears. A phenomenological auditory-nerve model was used to verify that the interaural correlations in TFS differed across conditions, whereas the ENVs retained a high degree of correlation, as intended. Overall, the results from this study revealed that binaural TFS cues, especially for frequency regions below 1500 Hz, are critical for achieving SRM in NH listeners. Potential implications for studying SRM in hearing-impaired listeners are discussed. SIGNIFICANCE STATEMENT Acoustic signals received by the auditory system pass first through an array of physiologically based band-pass filters. Conceptually, at the output of each filter, there are two principal forms of temporal information: slowly varying fluctuations in the envelope (ENV) and rapidly varying fluctuations in the temporal fine structure (TFS). The importance of these two types of information in everyday listening (e.g., conversing in a noisy social situation; the "cocktail-party" problem) has not been established. This study assessed the contributions of binaural ENV and TFS cues for understanding speech in multiple-talker situations. Results suggest that, whereas the ENV cues are important for speech intelligibility, binaural TFS cues are critical for perceptually segregating the different talkers and thus for solving the cocktail party problem.
Collapse
|
26
|
Moncada-Torres A, van Wieringen A, Bruce IC, Wouters J, Francart T. Predicting phoneme and word recognition in noise using a computational model of the auditory periphery. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:300. [PMID: 28147586 DOI: 10.1121/1.4973569] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Several filterbank-based metrics have been proposed to predict speech intelligibility (SI). However, these metrics incorporate little knowledge of the auditory periphery. Neurogram-based metrics provide an alternative, incorporating knowledge of the physiology of hearing by using a mathematical model of the auditory nerve response. In this work, SI was assessed utilizing different filterbank-based metrics (the speech intelligibility index and the speech-based envelope power spectrum model) and neurogram-based metrics, using the biologically inspired model of the auditory nerve proposed by Zilany, Bruce, Nelson, and Carney [(2009), J. Acoust. Soc. Am. 126(5), 2390-2412] as a front-end and the neurogram similarity metric and spectro temporal modulation index as a back-end. Then, the correlations with behavioural scores were computed. Results showed that neurogram-based metrics representing the speech envelope showed higher correlations with the behavioural scores at a word level. At a per-phoneme level, it was found that phoneme transitions contribute to higher correlations between objective measures that use speech envelope information at the auditory periphery level and behavioural data. The presented framework could function as a useful tool for the validation and tuning of speech materials, as well as a benchmark for the development of speech processing algorithms.
Collapse
Affiliation(s)
- Arturo Moncada-Torres
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Astrid van Wieringen
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Ian C Bruce
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Jan Wouters
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| |
Collapse
|
27
|
Ananthakrishnan S, Krishnan A, Bartlett E. Human Frequency Following Response: Neural Representation of Envelope and Temporal Fine Structure in Listeners with Normal Hearing and Sensorineural Hearing Loss. Ear Hear 2016; 37:e91-e103. [PMID: 26583482 DOI: 10.1097/aud.0000000000000247] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVE Listeners with sensorineural hearing loss (SNHL) typically experience reduced speech perception, which is not completely restored with amplification. This likely occurs because cochlear damage, in addition to elevating audiometric thresholds, alters the neural representation of speech transmitted to higher centers along the auditory neuroaxis. While the deleterious effects of SNHL on speech perception in humans have been well-documented using behavioral paradigms, our understanding of the neural correlates underlying these perceptual deficits remains limited. Using the scalp-recorded frequency following response (FFR), the authors examine the effects of SNHL and aging on subcortical neural representation of acoustic features important for pitch and speech perception, namely the periodicity envelope (F0) and temporal fine structure (TFS; formant structure), as reflected in the phase-locked neural activity generating the FFR. DESIGN FFRs were obtained from 10 listeners with normal hearing (NH) and 9 listeners with mild-moderate SNHL in response to a steady-state English back vowel /u/ presented at multiple intensity levels. Use of multiple presentation levels facilitated comparisons at equal sound pressure level (SPL) and equal sensation level. In a second follow-up experiment to address the effect of age on envelope and TFS representation, FFRs were obtained from 25 NH and 19 listeners with mild to moderately severe SNHL to the same vowel stimulus presented at 80 dB SPL. Temporal waveforms, Fast Fourier Transform and spectrograms were used to evaluate the magnitude of the phase-locked activity at F0 (periodicity envelope) and F1 (TFS). RESULTS Neural representation of both envelope (F0) and TFS (F1) at equal SPLs was stronger in NH listeners compared with listeners with SNHL. Also, comparison of neural representation of F0 and F1 across stimulus levels expressed in SPL and sensation level (accounting for audibility) revealed that level-related changes in F0 and F1 magnitude were different for listeners with SNHL compared with listeners with NH. Furthermore, the degradation in subcortical neural representation was observed to persist in listeners with SNHL even when the effects of age were controlled for. CONCLUSIONS Overall, our results suggest a relatively greater degradation in the neural representation of TFS compared with periodicity envelope in individuals with SNHL. This degraded neural representation of TFS in SNHL, as reflected in the brainstem FFR, may reflect a disruption in the temporal pattern of phase-locked neural activity arising from altered tonotopic maps and/or wider filters causing poor frequency selectivity in these listeners. Finally, while preliminary results indicate that the deleterious effects of SNHL may be greater than age-related degradation in subcortical neural representation, the lack of a balanced age-matched control group in this study does not permit us to completely rule out the effects of age on subcortical neural representation.
Collapse
Affiliation(s)
- Saradha Ananthakrishnan
- 1Department of Speech Language Hearing Sciences, Purdue University, West Lafayette, Indiana, USA; 2Department of Audiology, Speech-Language Pathology and Deaf studies, Towson University, Towson, Maryland, USA; 3Department of Biomedical Engineering, Purdue University, West Lafayette, Indiana, USA; and 4Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | | | | |
Collapse
|
28
|
Abstract
Diagnosing and treating hearing impairment is challenging because people with similar degrees of sensorineural hearing loss (SNHL) often have different speech-recognition abilities. The speech-based envelope power spectrum model (sEPSM) has demonstrated that the signal-to-noise ratio (SNRENV) from a modulation filter bank provides a robust speech-intelligibility measure across a wider range of degraded conditions than many long-standing models. In the sEPSM, noise (N) is assumed to: (a) reduce S + N envelope power by filling in dips within clean speech (S) and (b) introduce an envelope noise floor from intrinsic fluctuations in the noise itself. While the promise of SNRENV has been demonstrated for normal-hearing listeners, it has not been thoroughly extended to hearing-impaired listeners because of limited physiological knowledge of how SNHL affects speech-in-noise envelope coding relative to noise alone. Here, envelope coding to speech-in-noise stimuli was quantified from auditory-nerve model spike trains using shuffled correlograms, which were analyzed in the modulation-frequency domain to compute modulation-band estimates of neural SNRENV. Preliminary spike-train analyses show strong similarities to the sEPSM, demonstrating feasibility of neural SNRENV computations. Results suggest that individual differences can occur based on differential degrees of outer- and inner-hair-cell dysfunction in listeners currently diagnosed into the single audiological SNHL category. The predicted acoustic-SNR dependence in individual differences suggests that the SNR-dependent rate of susceptibility could be an important metric in diagnosing individual differences. Future measurements of the neural SNRENV in animal studies with various forms of SNHL will provide valuable insight for understanding individual differences in speech-in-noise intelligibility.
Collapse
Affiliation(s)
- Varsha H. Rallapalli
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN, USA
| | - Michael G. Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN, USA
- Weldon School of Biomedical Engineering, Purdue University, IN, USA
| |
Collapse
|
29
|
Reed CM, Desloge JG, Braida LD, Perez ZD, Léger AC. Level variations in speech: Effect on masking release in hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:102. [PMID: 27475136 PMCID: PMC6910012 DOI: 10.1121/1.4954746] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 06/02/2016] [Accepted: 06/10/2016] [Indexed: 05/31/2023]
Abstract
Acoustic speech is marked by time-varying changes in the amplitude envelope that may pose difficulties for hearing-impaired listeners. Removal of these variations (e.g., by the Hilbert transform) could improve speech reception for such listeners, particularly in fluctuating interference. Léger, Reed, Desloge, Swaminathan, and Braida [(2015b). J. Acoust. Soc. Am. 138, 389-403] observed that a normalized measure of masking release obtained for hearing-impaired listeners using speech processed to preserve temporal fine-structure (TFS) cues was larger than that for unprocessed or envelope-based speech. This study measured masking release for two other speech signals in which level variations were minimal: peak clipping and TFS processing of an envelope signal. Consonant identification was measured for hearing-impaired listeners in backgrounds of continuous and fluctuating speech-shaped noise. The normalized masking release obtained using speech with normal variations in overall level was substantially less than that observed using speech processed to achieve highly restricted level variations. These results suggest that the performance of hearing-impaired listeners in fluctuating noise may be improved by signal processing that leads to a decrease in stimulus level variations.
Collapse
Affiliation(s)
- Charlotte M Reed
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Joseph G Desloge
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Louis D Braida
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Zachary D Perez
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Agnès C Léger
- School of Psychological Sciences, University of Manchester, Manchester, M13 9PL, United Kingdom
| |
Collapse
|
30
|
The Role of Age-Related Declines in Subcortical Auditory Processing in Speech Perception in Noise. J Assoc Res Otolaryngol 2016; 17:441-60. [PMID: 27216166 PMCID: PMC5023535 DOI: 10.1007/s10162-016-0564-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2015] [Accepted: 03/17/2016] [Indexed: 10/29/2022] Open
Abstract
Older adults, even those without hearing impairment, often experience increased difficulties understanding speech in the presence of background noise. This study examined the role of age-related declines in subcortical auditory processing in the perception of speech in different types of background noise. Participants included normal-hearing young (19 - 29 years) and older (60 - 72 years) adults. Normal hearing was defined as pure-tone thresholds of 25 dB HL or better at octave frequencies from 0.25 to 4 kHz in both ears and at 6 kHz in at least one ear. Speech reception thresholds (SRTs) to sentences were measured in steady-state (SS) and 10-Hz amplitude-modulated (AM) speech-shaped noise, as well as two-talker babble. In addition, click-evoked auditory brainstem responses (ABRs) and envelope following responses (EFRs) in response to the vowel /ɑ/ in quiet, SS, and AM noise were measured. Of primary interest was the relationship between the SRTs and EFRs. SRTs were significantly higher (i.e., worse) by about 1.5 dB for older adults in two-talker babble but not in AM and SS noise. In addition, the EFRs of the older adults were less robust compared to the younger participants in quiet, AM, and SS noise. Both young and older adults showed a "neural masking release," indicated by a more robust EFR at the trough compared to the peak of the AM masker. The amount of neural masking release did not differ between the two age groups. Variability in SRTs was best accounted for by audiometric thresholds (pure-tone average across 0.5-4 kHz) and not by the EFR in quiet or noise. Aging is thus associated with a degradation of the EFR, both in quiet and noise. However, these declines in subcortical neural speech encoding are not necessarily associated with impaired perception of speech in noise, as measured by the SRT, in normal-hearing older adults.
Collapse
|
31
|
Hossain ME, Jassim WA, Zilany MSA. Reference-Free Assessment of Speech Intelligibility Using Bispectrum of an Auditory Neurogram. PLoS One 2016; 11:e0150415. [PMID: 26967160 PMCID: PMC4788356 DOI: 10.1371/journal.pone.0150415] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2015] [Accepted: 02/12/2016] [Indexed: 11/19/2022] Open
Abstract
Sensorineural hearing loss occurs due to damage to the inner and outer hair cells of the peripheral auditory system. Hearing loss can cause decreases in audibility, dynamic range, frequency and temporal resolution of the auditory system, and all of these effects are known to affect speech intelligibility. In this study, a new reference-free speech intelligibility metric is proposed using 2-D neurograms constructed from the output of a computational model of the auditory periphery. The responses of the auditory-nerve fibers with a wide range of characteristic frequencies were simulated to construct neurograms. The features of the neurograms were extracted using third-order statistics referred to as bispectrum. The phase coupling of neurogram bispectrum provides a unique insight for the presence (or deficit) of supra-threshold nonlinearities beyond audibility for listeners with normal hearing (or hearing loss). The speech intelligibility scores predicted by the proposed method were compared to the behavioral scores for listeners with normal hearing and hearing loss both in quiet and under noisy background conditions. The results were also compared to the performance of some existing methods. The predicted results showed a good fit with a small error suggesting that the subjective scores can be estimated reliably using the proposed neural-response-based metric. The proposed metric also had a wide dynamic range, and the predicted scores were well-separated as a function of hearing loss. The proposed metric successfully captures the effects of hearing loss and supra-threshold nonlinearities on speech intelligibility. This metric could be applied to evaluate the performance of various speech-processing algorithms designed for hearing aids and cochlear implants.
Collapse
Affiliation(s)
- Mohammad E. Hossain
- Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur, Malaysia
| | - Wissam A. Jassim
- Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur, Malaysia
| | - Muhammad S. A. Zilany
- Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur, Malaysia
- * E-mail:
| |
Collapse
|
32
|
Léger AC, Reed CM, Desloge JG, Swaminathan J, Braida LD. Consonant identification in noise using Hilbert-transform temporal fine-structure speech and recovered-envelope speech for listeners with normal and impaired hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:389-403. [PMID: 26233038 PMCID: PMC4514718 DOI: 10.1121/1.4922949] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Revised: 04/07/2015] [Accepted: 06/11/2015] [Indexed: 05/31/2023]
Abstract
Consonant-identification ability was examined in normal-hearing (NH) and hearing-impaired (HI) listeners in the presence of steady-state and 10-Hz square-wave interrupted speech-shaped noise. The Hilbert transform was used to process speech stimuli (16 consonants in a-C-a syllables) to present envelope cues, temporal fine-structure (TFS) cues, or envelope cues recovered from TFS speech. The performance of the HI listeners was inferior to that of the NH listeners both in terms of lower levels of performance in the baseline condition and in the need for higher signal-to-noise ratio to yield a given level of performance. For NH listeners, scores were higher in interrupted noise than in steady-state noise for all speech types (indicating substantial masking release). For HI listeners, masking release was typically observed for TFS and recovered-envelope speech but not for unprocessed and envelope speech. For both groups of listeners, TFS and recovered-envelope speech yielded similar levels of performance and consonant confusion patterns. The masking release observed for TFS and recovered-envelope speech may be related to level effects associated with the manner in which the TFS processing interacts with the interrupted noise signal, rather than to the contributions of TFS cues per se.
Collapse
Affiliation(s)
- Agnès C Léger
- School of Psychological Sciences, University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Charlotte M Reed
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Joseph G Desloge
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Jayaganesh Swaminathan
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Louis D Braida
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
33
|
The Role of Temporal Envelope and Fine Structure in Mandarin Lexical Tone Perception in Auditory Neuropathy Spectrum Disorder. PLoS One 2015; 10:e0129710. [PMID: 26052707 PMCID: PMC4459992 DOI: 10.1371/journal.pone.0129710] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 05/12/2015] [Indexed: 11/19/2022] Open
Abstract
Temporal information in a signal can be partitioned into temporal envelope (E) and fine structure (FS). Fine structure is important for lexical tone perception for normal-hearing (NH) listeners, and listeners with sensorineural hearing loss (SNHL) have an impaired ability to use FS in lexical tone perception due to the reduced frequency resolution. The present study was aimed to assess which of the acoustic aspects (E or FS) played a more important role in lexical tone perception in subjects with auditory neuropathy spectrum disorder (ANSD) and to determine whether it was the deficit in temporal resolution or frequency resolution that might lead to more detrimental effects on FS processing in pitch perception. Fifty-eight native Mandarin Chinese-speaking subjects (27 with ANSD, 16 with SNHL, and 15 with NH) were assessed for (1) their ability to recognize lexical tones using acoustic E or FS cues with the “auditory chimera” technique, (2) temporal resolution as measured with temporal gap detection (TGD) threshold, and (3) frequency resolution as measured with the Q10dB values of the psychophysical tuning curves. Overall, 26.5%, 60.2%, and 92.1% of lexical tone responses were consistent with FS cues for tone perception for listeners with ANSD, SNHL, and NH, respectively. The mean TGD threshold was significantly higher for listeners with ANSD (11.9 ms) than for SNHL (4.0 ms; p < 0.001) and NH (3.9 ms; p < 0.001) listeners, with no significant difference between SNHL and NH listeners. In contrast, the mean Q10dB for listeners with SNHL (1.8±0.4) was significantly lower than that for ANSD (3.5±1.0; p < 0.001) and NH (3.4±0.9; p < 0.001) listeners, with no significant difference between ANSD and NH listeners. These results suggest that reduced temporal resolution, as opposed to reduced frequency selectivity, in ANSD subjects leads to greater degradation of FS processing for pitch perception.
Collapse
|
34
|
Poblete V, Espic F, King S, Stern RM, Huenupán F, Fredes J, Yoma NB. A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification. COMPUT SPEECH LANG 2015. [DOI: 10.1016/j.csl.2014.10.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
35
|
Implications of within-fiber temporal coding for perceptual studies of F0 discrimination and discrimination of harmonic and inharmonic tone complexes. J Assoc Res Otolaryngol 2015; 15:465-82. [PMID: 24658856 DOI: 10.1007/s10162-014-0451-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2013] [Accepted: 02/17/2014] [Indexed: 10/25/2022] Open
Abstract
Recent psychophysical studies suggest that normal-hearing (NH) listeners can use acoustic temporal-fine-structure (TFS) cues for accurately discriminating shifts in the fundamental frequency (F0) of complex tones, or equal shifts in all component frequencies, even when the components are peripherally unresolved. The present study quantified both envelope (ENV) and TFS cues in single auditory-nerve (AN) fiber responses (henceforth referred to as neural ENV and TFS cues) from NH chinchillas in response to harmonic and inharmonic complex tones similar to those used in recent psychophysical studies. The lowest component in the tone complex (i.e., harmonic rank N) was systematically varied from 2 to 20 to produce various resolvability conditions in chinchillas (partially resolved to completely unresolved). Neural responses to different pairs of TEST (F0 or frequency shifted) and standard or reference (REF) stimuli were used to compute shuffled cross-correlograms, from which cross-correlation coefficients representing the degree of similarity between responses were derived separately for TFS and ENV. For a given F0 shift, the dissimilarity (TEST vs. REF) was greater for neural TFS than ENV. However, this difference was stimulus-based; the sensitivities of the neural TFS and ENV metrics were equivalent for equal absolute shifts of their relevant frequencies (center component and F0, respectively). For the F0-discrimination task, both ENV and TFS cues were available and could in principle be used for task performance. However, in contrast to human performance, neural TFS cues quantified with our cross-correlation coefficients were unaffected by phase randomization, suggesting that F0 discrimination for unresolved harmonics does not depend solely on TFS cues. For the frequency-shift (harmonic-versus-inharmonic) discrimination task, neural ENV cues were not available. Neural TFS cues were available and could in principle support performance in this task; however, in contrast to human-listeners' performance, these TFS cues showed no dependence on N. We conclude that while AN-fiber responses contain TFS-related cues, which can in principle be used to discriminate changes in F0 or equal shifts in component frequencies of peripherally unresolved harmonics, performance in these two psychophysical tasks appears to be limited by other factors (e.g., central processing noise).
Collapse
|
36
|
Léger AC, Desloge JG, Braida LD, Swaminathan J. The role of recovered envelope cues in the identification of temporal-fine-structure speech for hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:505-508. [PMID: 25618081 PMCID: PMC4304958 DOI: 10.1121/1.4904540] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Revised: 11/14/2014] [Accepted: 11/26/2014] [Indexed: 06/01/2023]
Abstract
Narrowband speech can be separated into fast temporal cues [temporal fine structure (TFS)], and slow amplitude modulations (envelope). Speech processed to contain only TFS leads to envelope recovery through cochlear filtering, which has been suggested to account for TFS-speech intelligibility for normal-hearing listeners. Hearing-impaired listeners have deficits with TFS-speech identification, but the contribution of recovered-envelope cues to these deficits is unknown. This was assessed for hearing-impaired listeners by measuring identification of disyllables processed to contain TFS or recovered-envelope cues. Hearing-impaired listeners performed worse than normal-hearing listeners, but TFS-speech intelligibility was accounted for by recovered-envelope cues for both groups.
Collapse
Affiliation(s)
- Agnès C Léger
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 36-757, Cambridge, Massachusetts 02139
| | - Joseph G Desloge
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 36-757, Cambridge, Massachusetts 02139
| | - Louis D Braida
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 36-757, Cambridge, Massachusetts 02139
| | - Jayaganesh Swaminathan
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 36-757, Cambridge, Massachusetts 02139
| |
Collapse
|
37
|
Optimal combination of neural temporal envelope and fine structure cues to explain speech identification in background noise. J Neurosci 2014; 34:12145-54. [PMID: 25186758 DOI: 10.1523/jneurosci.1025-14.2014] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The dichotomy between acoustic temporal envelope (ENV) and fine structure (TFS) cues has stimulated numerous studies over the past decade to understand the relative role of acoustic ENV and TFS in human speech perception. Such acoustic temporal speech cues produce distinct neural discharge patterns at the level of the auditory nerve, yet little is known about the central neural mechanisms underlying the dichotomy in speech perception between neural ENV and TFS cues. We explored the question of how the peripheral auditory system encodes neural ENV and TFS cues in steady or fluctuating background noise, and how the central auditory system combines these forms of neural information for speech identification. We sought to address this question by (1) measuring sentence identification in background noise for human subjects as a function of the degree of available acoustic TFS information and (2) examining the optimal combination of neural ENV and TFS cues to explain human speech perception performance using computational models of the peripheral auditory system and central neural observers. Speech-identification performance by human subjects decreased as the acoustic TFS information was degraded in the speech signals. The model predictions best matched human performance when a greater emphasis was placed on neural ENV coding rather than neural TFS. However, neural TFS cues were necessary to account for the full effect of background-noise modulations on human speech-identification performance.
Collapse
|
38
|
Swaminathan J, Reed CM, Desloge JG, Braida LD, Delhorne LA. Consonant identification using temporal fine structure and recovered envelope cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:2078-2090. [PMID: 25235005 PMCID: PMC4167752 DOI: 10.1121/1.4865920] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Revised: 01/29/2014] [Accepted: 02/03/2014] [Indexed: 05/31/2023]
Abstract
The contribution of recovered envelopes (RENVs) to the utilization of temporal-fine structure (TFS) speech cues was examined in normal-hearing listeners. Consonant identification experiments used speech stimuli processed to present TFS or RENV cues. Experiment 1 examined the effects of exposure and presentation order using 16-band TFS speech and 40-band RENV speech recovered from 16-band TFS speech. Prior exposure to TFS speech aided in the reception of RENV speech. Performance on the two conditions was similar (∼50%-correct) for experienced listeners as was the pattern of consonant confusions. Experiment 2 examined the effect of varying the number of RENV bands recovered from 16-band TFS speech. Mean identification scores decreased as the number of RENV bands decreased from 40 to 8 and were only slightly above chance levels for 16 and 8 bands. Experiment 3 examined the effect of varying the number of bands in the TFS speech from which 40-band RENV speech was constructed. Performance fell from 85%- to 31%-correct as the number of TFS bands increased from 1 to 32. Overall, these results suggest that the interpretation of previous studies that have used TFS speech may have been confounded with the presence of RENVs.
Collapse
Affiliation(s)
- Jayaganesh Swaminathan
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
| | - Charlotte M Reed
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
| | - Joseph G Desloge
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
| | - Louis D Braida
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
| | - Lorraine A Delhorne
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
| |
Collapse
|
39
|
Anderson MC, Arehart KH, Kates JM. The effects of noise vocoding on speech quality perception. Hear Res 2014; 309:75-83. [DOI: 10.1016/j.heares.2013.11.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2012] [Revised: 11/22/2013] [Accepted: 11/25/2013] [Indexed: 10/25/2022]
|
40
|
Won JH, Shim HJ, Lorenzi C, Rubinstein JT. Use of amplitude modulation cues recovered from frequency modulation for cochlear implant users when original speech cues are severely degraded. J Assoc Res Otolaryngol 2014; 15:423-39. [PMID: 24532186 DOI: 10.1007/s10162-014-0444-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Accepted: 01/20/2014] [Indexed: 11/30/2022] Open
Abstract
Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.
Collapse
Affiliation(s)
- Jong Ho Won
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, TN, 37996, USA
| | | | | | | |
Collapse
|
41
|
Churchill TH, Kan A, Goupell MJ, Ihlefeld A, Litovsky RY. Speech perception in noise with a harmonic complex excited vocoder. J Assoc Res Otolaryngol 2014; 15:265-78. [PMID: 24448721 DOI: 10.1007/s10162-013-0435-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 12/17/2013] [Indexed: 12/01/2022] Open
Abstract
A cochlear implant (CI) presents band-pass-filtered acoustic envelope information by modulating current pulse train levels. Similarly, a vocoder presents envelope information by modulating an acoustic carrier. By studying how normal hearing (NH) listeners are able to understand degraded speech signals with a vocoder, the parameters that best simulate electric hearing and factors that might contribute to the NH-CI performance difference may be better understood. A vocoder with harmonic complex carriers (fundamental frequency, f0 = 100 Hz) was used to study the effect of carrier phase dispersion on speech envelopes and intelligibility. The starting phases of the harmonic components were randomly dispersed to varying degrees prior to carrier filtering and modulation. NH listeners were tested on recognition of a closed set of vocoded words in background noise. Two sets of synthesis filters simulated different amounts of current spread in CIs. Results showed that the speech vocoded with carriers whose starting phases were maximally dispersed was the most intelligible. Superior speech understanding may have been a result of the flattening of the dispersed-phase carrier's intrinsic temporal envelopes produced by the large number of interacting components in the high-frequency channels. Cross-correlogram analyses of auditory nerve model simulations confirmed that randomly dispersing the carrier's component starting phases resulted in better neural envelope representation. However, neural metrics extracted from these analyses were not found to accurately predict speech recognition scores for all vocoded speech conditions. It is possible that central speech understanding mechanisms are insensitive to the envelope-fine structure dichotomy exploited by vocoders.
Collapse
Affiliation(s)
- Tyler H Churchill
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue #521, Madison, WI, 53705, USA,
| | | | | | | | | |
Collapse
|
42
|
Kale S, Micheyl C, Heinz MG. Effects of sensorineural hearing loss on temporal coding of harmonic and inharmonic tone complexes in the auditory nerve. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 787:109-18. [PMID: 23716215 DOI: 10.1007/978-1-4614-1590-9_13] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Listeners with sensorineural hearing loss (SNHL) often show poorer thresholds for fundamental-frequency (F0) discrimination and poorer discrimination between harmonic and frequency-shifted (inharmonic) complex tones, than normal-hearing (NH) listeners-especially when these tones contain resolved or partially resolved components. It has been suggested that these perceptual deficits reflect reduced access to temporal-fine-structure (TFS) information and could be due to degraded phase locking in the auditory nerve (AN) with SNHL. In the present study, TFS and temporal-envelope (ENV) cues in single AN-fiber responses to band-pass-filtered harmonic and inharmonic complex tones were -measured in chinchillas with either normal-hearing or noise-induced SNHL. The stimuli were comparable to those used in recent psychophysical studies of F0 and harmonic/inharmonic discrimination. As in those studies, the rank of the center component was manipulated to produce -different resolvability conditions, different phase relationships (cosine and random phase) were tested, and background noise was present. Neural TFS and ENV cues were quantified using cross-correlation coefficients computed using shuffled cross correlograms between neural responses to REF (harmonic) and TEST (F0- or frequency-shifted) stimuli. In animals with SNHL, AN-fiber tuning curves showed elevated thresholds, broadened tuning, best-frequency shifts, and downward shifts in the dominant TFS response component; however, no significant degradation in the ability of AN fibers to encode TFS or ENV cues was found. Consistent with optimal-observer analyses, the results indicate that TFS and ENV cues depended only on the relevant frequency shift in Hz and thus were not degraded because phase locking remained intact. These results suggest that perceptual "TFS-processing" deficits do not simply reflect degraded phase locking at the level of the AN. To the extent that performance in F0- and harmonic/inharmonic discrimination tasks depend on TFS cues, it is likely through a more complicated (suboptimal) decoding mechanism, which may involve "spatiotemporal" (place-time) neural representations.
Collapse
Affiliation(s)
- Sushrut Kale
- Department of Otolaryngology-Head & Neck Surgery, Columbia University, New York, NY 10032, USA.
| | | | | |
Collapse
|
43
|
Apoux F, Yoho SE, Youngdahl CL, Healy EW. Role and relative contribution of temporal envelope and fine structure cues in sentence recognition by normal-hearing listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:2205-12. [PMID: 23967950 PMCID: PMC3765279 DOI: 10.1121/1.4816413] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The present study investigated the role and relative contribution of envelope and temporal fine structure (TFS) to sentence recognition in noise. Target and masker stimuli were added at five different signal-to-noise ratios (SNRs) and filtered into 30 contiguous frequency bands. The envelope and TFS were extracted from each band by Hilbert decomposition. The final stimuli consisted of the envelope of the target/masker sound mixture at x dB SNR and the TFS of the same sound mixture at y dB SNR. A first experiment showed a very limited contribution of TFS cues, indicating that sentence recognition in noise relies almost exclusively on temporal envelope cues. A second experiment showed that replacing the carrier of a sound mixture with noise (vocoder processing) cannot be considered equivalent to disrupting the TFS of the target signal by adding a background noise. Accordingly, a re-evaluation of the vocoder approach as a model to further understand the role of TFS cues in noisy situations may be necessary. Overall, these data are consistent with the view that speech information is primarily extracted from the envelope while TFS cues are primarily used to detect glimpses of the target.
Collapse
Affiliation(s)
- Frédéric Apoux
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA.
| | | | | | | |
Collapse
|
44
|
Shamma S, Lorenzi C. On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:2818-33. [PMID: 23654388 PMCID: PMC3663870 DOI: 10.1121/1.4795783] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
There is much debate on how the spectrotemporal modulations of speech (or its spectrogram) are encoded in the responses of the auditory nerve, and whether speech intelligibility is best conveyed via the "envelope" (E) or "temporal fine-structure" (TFS) of the neural responses. Wide use of vocoders to resolve this question has commonly assumed that manipulating the amplitude-modulation and frequency-modulation components of the vocoded signal alters the relative importance of E or TFS encoding on the nerve, thus facilitating assessment of their relative importance to intelligibility. Here we argue that this assumption is incorrect, and that the vocoder approach is ineffective in differentially altering the neural E and TFS. In fact, we demonstrate using a simplified model of early auditory processing that both neural E and TFS encode the speech spectrogram with constant and comparable relative effectiveness regardless of the vocoder manipulations. However, we also show that neural TFS cues are less vulnerable than their E counterparts under severe noisy conditions, and hence should play a more prominent role in cochlear stimulation strategies.
Collapse
Affiliation(s)
- Shihab Shamma
- Electrical and Computer Engineering Department and Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA.
| | | |
Collapse
|
45
|
Anderson S, Parbery-Clark A, White-Schwoch T, Drehobl S, Kraus N. Effects of hearing loss on the subcortical representation of speech cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:3030-8. [PMID: 23654406 PMCID: PMC3663860 DOI: 10.1121/1.4799804] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Individuals with sensorineural hearing loss often report frustration with speech being loud but not clear, especially in background noise. Despite advanced digital technology, hearing aid users may resort to removing their hearing aids in noisy environments due to the perception of excessive loudness. In an animal model, sensorineural hearing loss results in greater auditory nerve coding of the stimulus envelope, leading to a relative deficit of stimulus fine structure. Based on the hypothesis that brainstem encoding of the temporal envelope is greater in humans with sensorineural hearing loss, speech-evoked brainstem responses were recorded in normal hearing and hearing impaired age-matched groups of older adults. In the hearing impaired group, there was a disruption in the balance of envelope-to-fine structure representation compared to that of the normal hearing group. This imbalance may underlie the difficulty experienced by individuals with sensorineural hearing loss when trying to understand speech in background noise. This finding advances the understanding of the effects of sensorineural hearing loss on central auditory processing of speech in humans. Moreover, this finding has clinical potential for developing new amplification or implantation technologies, and in developing new training regimens to address this relative deficit of fine structure representation.
Collapse
Affiliation(s)
- Samira Anderson
- Northwestern University, Auditory Neuroscience Laboratory, Communication Sciences, 2240 North Campus Drive, Evanston, Illinois 60208, USA
| | | | | | | | | |
Collapse
|
46
|
Imennov NS, Won JH, Drennan WR, Jameyson E, Rubinstein JT. Detection of acoustic temporal fine structure by cochlear implant listeners: behavioral results and computational modeling. Hear Res 2013; 298:60-72. [PMID: 23333260 PMCID: PMC3605703 DOI: 10.1016/j.heares.2013.01.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/20/2012] [Revised: 12/22/2012] [Accepted: 01/08/2013] [Indexed: 10/27/2022]
Abstract
A test of within-channel detection of acoustic temporal fine structure (aTFS) cues is presented. Eight cochlear implant listeners (CI) were asked to discriminate between two Schroeder-phase (SP) complexes using a two-alternative, forced-choice task. Because differences between the acoustic stimuli are primarily constrained to their aTFS, successful discrimination reflects a combination of the subjects' perception of and the strategy's ability to deliver aTFS cues. Subjects were mapped with single-channel Continuous Interleaved Sampling (CIS) and Simultaneous Analog Stimulation (SAS) strategies. To compare within- and across- channel delivery of aTFS cues, a 16-channel clinical HiRes strategy was also fitted. Throughout testing, SAS consistently outperformed the CIS strategy (p ≤ 0.002). For SP stimuli with F0 = 50 Hz, the highest discrimination scores were achieved with the HiRes encoding, followed by scores with the SAS and the CIS strategies, respectively. At 200 Hz, single-channel SAS performed better than HiRes (p = 0.022), demonstrating that under a more challenging testing condition, discrimination performance with a single-channel analog encoding can exceed that of a 16-channel pulsatile strategy. To better understand the intermediate steps of discrimination, a biophysical model was used to examine the neural discharges evoked by the SP stimuli. Discrimination estimates calculated from simulated neural responses successfully tracked the behavioral performance trends of single-channel CI listeners.
Collapse
Affiliation(s)
- Nikita S. Imennov
- Department of Bioengineering, University of Washington, Seattle, WA 98195
- VM Bloedel Hearing Research Center, University of Washington, Seattle, WA 98195
| | - Jong Ho Won
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, TN 37996
| | - Ward R. Drennan
- VM Bloedel Hearing Research Center, University of Washington, Seattle, WA 98195
- Department of Otolaryngology, Head & Neck Surgery, University of Washington, Seattle, WA 98195
| | - Elyse Jameyson
- VM Bloedel Hearing Research Center, University of Washington, Seattle, WA 98195
- Department of Otolaryngology, Head & Neck Surgery, University of Washington, Seattle, WA 98195
| | - Jay T. Rubinstein
- Department of Bioengineering, University of Washington, Seattle, WA 98195
- VM Bloedel Hearing Research Center, University of Washington, Seattle, WA 98195
- Department of Otolaryngology, Head & Neck Surgery, University of Washington, Seattle, WA 98195
| |
Collapse
|
47
|
Lorenzi C, Wallaert N, Gnansia D, Leger AC, Ives DT, Chays A, Garnier S, Cazals Y. Temporal-envelope reconstruction for hearing-impaired listeners. J Assoc Res Otolaryngol 2012; 13:853-65. [PMID: 23007719 PMCID: PMC3505588 DOI: 10.1007/s10162-012-0350-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2011] [Accepted: 09/09/2012] [Indexed: 10/27/2022] Open
Abstract
Recent studies suggest that normal-hearing listeners maintain robust speech intelligibility despite severe degradations of amplitude-modulation (AM) cues, by using temporal-envelope information recovered from broadband frequency-modulation (FM) speech cues at the output of cochlear filters. This study aimed to assess whether cochlear damage affects this capacity to reconstruct temporal-envelope information from FM. This was achieved by measuring the ability of 40 normal-hearing listeners and 41 listeners with mild-to-moderate hearing loss to identify syllables processed to degrade AM cues while leaving FM cues intact within three broad frequency bands spanning the range 65-3,645 Hz. Stimuli were presented at 65 dB SPL for both normal-hearing listeners and hearing-impaired listeners. They were presented as such or amplified using a modified half-gain rule for hearing-impaired listeners. Hearing-impaired listeners showed significantly poorer identification scores than normal-hearing listeners at both presentation levels. However, the deficit shown by hearing-impaired listeners for amplified stimuli was relatively modest. Overall, hearing-impaired data and the results of a simulation study were consistent with a poorer-than-normal ability to reconstruct temporal-envelope information resulting from a broadening of cochlear filters by a factor ranging from 2 to 4. These results suggest that mild-to-moderate cochlear hearing loss has only a modest detrimental effect on peripheral, temporal-envelope reconstruction mechanisms.
Collapse
Affiliation(s)
- Christian Lorenzi
- Equipe Audition (CNRS, Universite Paris Descartes, Ecole normale superieure), Institut d'Etude de la Cognition, Ecole normale superieure, Paris Sciences et Lettres, 29 rue d'Ulm, 75005 Paris, France.
| | | | | | | | | | | | | | | |
Collapse
|
48
|
Ives DT, Calcus A, Kalluri S, Strelcyk O, Sheft S, Lorenzi C. Effects of noise reduction on AM and FM perception. J Assoc Res Otolaryngol 2012. [PMID: 23180229 DOI: 10.1007/s10162-012-0358-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The goal of noise reduction (NR) algorithms in digital hearing aid devices is to reduce background noise whilst preserving as much of the original signal as possible. These algorithms may increase the signal-to-noise ratio (SNR) in an ideal case, but they generally fail to improve speech intelligibility. However, due to the complex nature of speech, it is difficult to disentangle the numerous low- and high-level effects of NR that may underlie the lack of speech perception benefits. The goal of this study was to better understand why NR algorithms do not improve speech intelligibility by investigating the effects of NR on the ability to discriminate two basic acoustic features, namely amplitude modulation (AM) and frequency modulation (FM) cues, known to be crucial for speech identification in quiet and in noise. Here, discrimination of complex, non-linguistic AM and FM patterns was measured for normal hearing listeners using a same/different task. The stimuli were generated by modulating 1-kHz pure tones by either a two-component AM or FM modulator with patterns changed by manipulating component phases. Modulation rates were centered on 3 Hz. Discrimination of AM and FM patterns was measured in quiet and in the presence of a white noise that had been passed through a gammatone filter centered on 1 kHz. The noise was presented at SNRs ranging from -6 to +12 dB. Stimuli were left as such or processed via an NR algorithm based on the spectral subtraction method. NR was found to yield small but systematic improvements in discrimination for the AM conditions at favorable SNRs but had little effect, if any, on FM discrimination. A computational model of early auditory processing was developed to quantify the fidelity of AM and FM transmission. The model captured the improvement in discrimination performance for AM stimuli at high SNRs with NR. However, the model also predicted a relatively small detrimental effect of NR for FM stimuli in contrast with the average psychophysical data. Overall, these results suggest that the lack of benefits of NR on speech intelligibility is partly caused by the limited effect of NR on the transmission of narrowband speech modulation cues.
Collapse
Affiliation(s)
- D Timothy Ives
- Ecole normale supérieure, Institut d'Etude de la Cognition, Paris Sciences et Lettres, 75005, Paris, France.
| | | | | | | | | | | |
Collapse
|
49
|
Jennings SG, Strickland EA. Evaluating the effects of olivocochlear feedback on psychophysical measures of frequency selectivity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:2483-96. [PMID: 23039443 PMCID: PMC3477188 DOI: 10.1121/1.4742723] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Revised: 07/11/2012] [Accepted: 07/16/2012] [Indexed: 05/19/2023]
Abstract
Frequency selectivity was evaluated under two conditions designed to assess the influence of a "precursor" stimulus on auditory filter bandwidths. The standard condition consisted of a short masker, immediately followed by a short signal. The precursor condition was identical except a 100-ms sinusoid at the signal frequency (i.e., the precursor) was presented before the masker. The standard and precursor conditions were compared for measurements of psychophysical tuning curves (PTCs), and notched noise tuning characteristics. Estimates of frequency selectivity were significantly broader in the precursor condition. In the second experiment, PTCs in the standard and precursor conditions were simulated to evaluate the influence of the precursor on PTC bandwidth. The model was designed to account for the influence of additivity of masking between the masker and precursor. Model simulations were able to qualitatively account for the perceptual data when outer hair cell gain of the model was reduced in the precursor condition. These findings suggest that the precursor may have reduced cochlear gain, in addition to producing additivity of masking. This reduction in gain may be mediated by the medial olivocochlear reflex.
Collapse
Affiliation(s)
- Skyler G Jennings
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA.
| | | |
Collapse
|
50
|
Won JH, Lorenzi C, Nie K, Li X, Jameyson EM, Drennan WR, Rubinstein JT. The ability of cochlear implant users to use temporal envelope cues recovered from speech frequency modulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:1113-1119. [PMID: 22894230 PMCID: PMC3427369 DOI: 10.1121/1.4726013] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2011] [Revised: 04/30/2012] [Accepted: 05/06/2012] [Indexed: 06/01/2023]
Abstract
Previous studies have demonstrated that normal-hearing listeners can understand speech using the recovered "temporal envelopes," i.e., amplitude modulation (AM) cues from frequency modulation (FM). This study evaluated this mechanism in cochlear implant (CI) users for consonant identification. Stimuli containing only FM cues were created using 1, 2, 4, and 8-band FM-vocoders to determine if consonant identification performance would improve as the recovered AM cues become more available. A consistent improvement was observed as the band number decreased from 8 to 1, supporting the hypothesis that (1) the CI sound processor generates recovered AM cues from broadband FM, and (2) CI users can use the recovered AM cues to recognize speech. The correlation between the intact and the recovered AM components at the output of the sound processor was also generally higher when the band number was low, supporting the consonant identification results. Moreover, CI subjects who were better at using recovered AM cues from broadband FM cues showed better identification performance with intact (unprocessed) speech stimuli. This suggests that speech perception performance variability in CI users may be partly caused by differences in their ability to use AM cues recovered from FM speech cues.
Collapse
Affiliation(s)
- Jong Ho Won
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, Tennessee 37996, USA.
| | | | | | | | | | | | | |
Collapse
|