1
|
Regev J, Relaño-Iborra H, Zaar J, Dau T. Disentangling the effects of hearing loss and age on amplitude modulation frequency selectivity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2589-2602. [PMID: 38607268 DOI: 10.1121/10.0025541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 03/19/2024] [Indexed: 04/13/2024]
Abstract
The processing and perception of amplitude modulation (AM) in the auditory system reflect a frequency-selective process, often described as a modulation filterbank. Previous studies on perceptual AM masking reported similar results for older listeners with hearing impairment (HI listeners) and young listeners with normal hearing (NH listeners), suggesting no effects of age or hearing loss on AM frequency selectivity. However, recent evidence has shown that age, independently of hearing loss, adversely affects AM frequency selectivity. Hence, this study aimed to disentangle the effects of hearing loss and age. A simultaneous AM masking paradigm was employed, using a sinusoidal carrier at 2.8 kHz, narrowband noise modulation maskers, and target modulation frequencies of 4, 16, 64, and 128 Hz. The results obtained from young (n = 3, 24-30 years of age) and older (n = 10, 63-77 years of age) HI listeners were compared to previously obtained data from young and older NH listeners. Notably, the HI listeners generally exhibited lower (unmasked) AM detection thresholds and greater AM frequency selectivity than their NH counterparts in both age groups. Overall, the results suggest that age negatively affects AM frequency selectivity for both NH and HI listeners, whereas hearing loss improves AM detection and AM selectivity, likely due to the loss of peripheral compression.
Collapse
Affiliation(s)
- Jonathan Regev
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, 2800, Denmark
| | - Helia Relaño-Iborra
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, 2800, Denmark
| | - Johannes Zaar
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, 2800, Denmark
- Eriksholm Research Centre, Snekkersten, 3070, Denmark
| | - Torsten Dau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, 2800, Denmark
- Copenhagen Hearing and Balance Center, Copenhagen University Hospital, Rigshospitalet, Copenhagen, 2100, Denmark
| |
Collapse
|
2
|
Viswanathan V, Heinz MG, Shinn-Cunningham BG. Impact of Reduced Spectral Resolution on Temporal-Coherence-Based Source Segregation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.11.584489. [PMID: 38586037 PMCID: PMC10998286 DOI: 10.1101/2024.03.11.584489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Hearing-impaired listeners struggle to understand speech in noise, even when using cochlear implants (CIs) or hearing aids. Successful listening in noisy environments depends on the brain's ability to organize a mixture of sound sources into distinct perceptual streams (i.e., source segregation). In normal-hearing listeners, temporal coherence of sound fluctuations across frequency channels supports this process by promoting grouping of elements belonging to a single acoustic source. We hypothesized that reduced spectral resolution-a hallmark of both electric/CI (from current spread) and acoustic (from broadened tuning) hearing with sensorineural hearing loss-degrades segregation based on temporal coherence. This is because reduced frequency resolution decreases the likelihood that a single sound source dominates the activity driving any specific channel; concomitantly, it increases the correlation in activity across channels. Consistent with our hypothesis, predictions from a physiologically plausible model of temporal-coherence-based segregation suggest that CI current spread reduces comodulation masking release (CMR; a correlate of temporal-coherence processing) and speech intelligibility in noise. These predictions are consistent with our behavioral data with simulated CI listening. Our model also predicts smaller CMR with increasing levels of outer-hair-cell damage. These results suggest that reduced spectral resolution relative to normal hearing impairs temporal-coherence-based segregation and speech-in-noise outcomes.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Neuroscience Institute, Carnegie Mellon University, Pitttsburgh, PA 15213
| | - Michael G. Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907
| | | |
Collapse
|
3
|
Regev J, Zaar J, Relaño-Iborra H, Dau T. Age-related reduction of amplitude modulation frequency selectivity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2298. [PMID: 37092934 DOI: 10.1121/10.0017835] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 03/27/2023] [Indexed: 05/03/2023]
Abstract
The perception of amplitude modulations (AMs) has been characterized by a frequency-selective process in the temporal envelope domain and simulated in computational auditory processing and perception models using a modulation filterbank. Such AM frequency-selective processing has been argued to be critical for the perception of complex sounds, including speech. This study aimed at investigating the effects of age on behavioral AM frequency selectivity in young (n = 11, 22-29 years) versus older (n = 10, 57-77 years) listeners with normal hearing, using a simultaneous AM masking paradigm with a sinusoidal carrier (2.8 kHz), target modulation frequencies of 4, 16, 64, and 128 Hz, and narrowband-noise modulation maskers. A reduction of AM frequency selectivity by a factor of up to 2 was found in the older listeners. While the observed AM selectivity co-varied with the unmasked AM detection sensitivity, the age-related broadening of the masked threshold patterns remained stable even when AM sensitivity was similar across groups for an extended stimulus duration. The results from the present study might provide a valuable basis for further investigations exploring the effects of age and reduced AM frequency selectivity on complex sound perception as well as the interaction of age and hearing impairment on AM processing and perception.
Collapse
Affiliation(s)
- Jonathan Regev
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, 2800, Denmark
| | - Johannes Zaar
- Eriksholm Research Centre, Snekkersten, 3070, Denmark
| | - Helia Relaño-Iborra
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, 2800, Denmark
| | - Torsten Dau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, 2800, Denmark
| |
Collapse
|
4
|
Temporal coherence structure rapidly shapes neuronal interactions. Nat Commun 2017; 8:13900. [PMID: 28054545 PMCID: PMC5228385 DOI: 10.1038/ncomms13900] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 11/10/2016] [Indexed: 11/08/2022] Open
Abstract
Perception of segregated sources is essential in navigating cluttered acoustic environments. A basic mechanism to implement this process is the temporal coherence principle. It postulates that a signal is perceived as emitted from a single source only when all of its features are temporally modulated coherently, causing them to bind perceptually. Here we report on neural correlates of this process as rapidly reshaped interactions in primary auditory cortex, measured in three different ways: as changes in response rates, as adaptations of spectrotemporal receptive fields following stimulation by temporally coherent and incoherent tone sequences, and as changes in spiking correlations during the tone sequences. Responses, sensitivity and presumed connectivity were rapidly enhanced by synchronous stimuli, and suppressed by alternating (asynchronous) sounds, but only when the animals engaged in task performance and were attentive to the stimuli. Temporal coherence and attention are therefore both important factors in auditory scene analysis. One can easily identify if multiple sounds are originating from a single source yet the neural mechanisms underlying this process are unknown. Here the authors show that temporally coherent sounds elicit changes in receptive field dynamics of auditory cortical neurons in ferrets only when paying attention.
Collapse
|
5
|
Speech onset enhancement improves intelligibility in adverse listening conditions for cochlear implant users. Hear Res 2016; 342:13-22. [DOI: 10.1016/j.heares.2016.09.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 09/07/2016] [Indexed: 11/17/2022]
|
6
|
Teki S, Barascud N, Picard S, Payne C, Griffiths TD, Chait M. Neural Correlates of Auditory Figure-Ground Segregation Based on Temporal Coherence. Cereb Cortex 2016; 26:3669-80. [PMID: 27325682 PMCID: PMC5004755 DOI: 10.1093/cercor/bhw173] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To make sense of natural acoustic environments, listeners must parse complex mixtures of sounds that vary in frequency, space, and time. Emerging work suggests that, in addition to the well-studied spectral cues for segregation, sensitivity to temporal coherence-the coincidence of sound elements in and across time-is also critical for the perceptual organization of acoustic scenes. Here, we examine pre-attentive, stimulus-driven neural processes underlying auditory figure-ground segregation using stimuli that capture the challenges of listening in complex scenes where segregation cannot be achieved based on spectral cues alone. Signals ("stochastic figure-ground": SFG) comprised a sequence of brief broadband chords containing random pure tone components that vary from 1 chord to another. Occasional tone repetitions across chords are perceived as "figures" popping out of a stochastic "ground." Magnetoencephalography (MEG) measurement in naïve, distracted, human subjects revealed robust evoked responses, commencing from about 150 ms after figure onset that reflect the emergence of the "figure" from the randomly varying "ground." Neural sources underlying this bottom-up driven figure-ground segregation were localized to planum temporale, and the intraparietal sulcus, demonstrating that this area, outside the "classic" auditory system, is also involved in the early stages of auditory scene analysis."
Collapse
Affiliation(s)
- Sundeep Teki
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK
- Auditory Cognition Group, Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Current address: Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, UK
| | - Nicolas Barascud
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK
- Ear Institute, University College London, London WC1X 8EE, UK
| | - Samuel Picard
- Ear Institute, University College London, London WC1X 8EE, UK
| | | | - Timothy D. Griffiths
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK
- Auditory Cognition Group, Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Maria Chait
- Ear Institute, University College London, London WC1X 8EE, UK
| |
Collapse
|
7
|
Chabot-Leclerc A, MacDonald EN, Dau T. Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:192. [PMID: 27475146 DOI: 10.1121/1.4954254] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
This study proposes a binaural extension to the multi-resolution speech-based envelope power spectrum model (mr-sEPSM) [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436-446]. It consists of a combination of better-ear (BE) and binaural unmasking processes, implemented as two monaural realizations of the mr-sEPSM combined with a short-term equalization-cancellation process, and uses the signal-to-noise ratio in the envelope domain (SNRenv) as the decision metric. The model requires only two parameters to be fitted per speech material and does not require an explicit frequency weighting. The model was validated against three data sets from the literature, which covered the following effects: the number of maskers, the masker types [speech-shaped noise (SSN), speech-modulated SSN, babble, and reversed speech], the masker(s) azimuths, reverberation on the target and masker, and the interaural time difference of the target and masker. The Pearson correlation coefficient between the simulated speech reception thresholds and the data across all experiments was 0.91. A model version that considered only BE processing performed similarly (correlation coefficient of 0.86) to the complete model, suggesting that BE processing could be considered sufficient to predict intelligibility in most realistic conditions.
Collapse
Affiliation(s)
- Alexandre Chabot-Leclerc
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, DK-2800, Kongens Lyngby, Denmark
| | - Ewen N MacDonald
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, DK-2800, Kongens Lyngby, Denmark
| | - Torsten Dau
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, DK-2800, Kongens Lyngby, Denmark
| |
Collapse
|
8
|
Maddox RK, Atilgan H, Bizley JK, Lee AKC. Auditory selective attention is enhanced by a task-irrelevant temporally coherent visual stimulus in human listeners. eLife 2015; 4:e04995. [PMID: 25654748 PMCID: PMC4337603 DOI: 10.7554/elife.04995] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 12/27/2014] [Indexed: 11/22/2022] Open
Abstract
In noisy settings, listening is aided by correlated dynamic visual cues gleaned from a talker's face-an improvement often attributed to visually reinforced linguistic information. In this study, we aimed to test the effect of audio-visual temporal coherence alone on selective listening, free of linguistic confounds. We presented listeners with competing auditory streams whose amplitude varied independently and a visual stimulus with varying radius, while manipulating the cross-modal temporal relationships. Performance improved when the auditory target's timecourse matched that of the visual stimulus. The fact that the coherence was between task-irrelevant stimulus features suggests that the observed improvement stemmed from the integration of auditory and visual streams into cross-modal objects, enabling listeners to better attend the target. These findings suggest that in everyday conditions, where listeners can often see the source of a sound, temporal cues provided by vision can help listeners to select one sound source from a mixture.
Collapse
Affiliation(s)
- Ross K Maddox
- Institute for Learning and Brain Sciences, University of Washington, Seattle, United States
| | - Huriye Atilgan
- Ear Institute, University College London, London, United Kingdom
| | | | - Adrian KC Lee
- Institute for Learning and Brain Sciences, University of Washington, Seattle, United States
- Department of Speech and Hearing Sciences, University of Washington, Seattle, United States
| |
Collapse
|
9
|
Krishnan L, Elhilali M, Shamma S. Segregating complex sound sources through temporal coherence. PLoS Comput Biol 2014; 10:e1003985. [PMID: 25521593 PMCID: PMC4270434 DOI: 10.1371/journal.pcbi.1003985] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Accepted: 10/14/2014] [Indexed: 11/18/2022] Open
Abstract
A new approach for the segregation of monaural sound mixtures is presented based on the principle of temporal coherence and using auditory cortical representations. Temporal coherence is the notion that perceived sources emit coherently modulated features that evoke highly-coincident neural response patterns. By clustering the feature channels with coincident responses and reconstructing their input, one may segregate the underlying source from the simultaneously interfering signals that are uncorrelated with it. The proposed algorithm requires no prior information or training on the sources. It can, however, gracefully incorporate cognitive functions and influences such as memories of a target source or attention to a specific set of its attributes so as to segregate it from its background. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of this ubiquitous and remarkable perceptual ability, and of its psychophysical manifestations in navigating complex sensory environments. Humans and many animals can effortlessly navigate complex sensory environments, segregating and attending to one desired target source while suppressing distracting and interfering others. In this paper, we present an algorithmic model that can accomplish this task with no prior information or training on complex signals such as speech mixtures, and speech in noise and music. The model accounts for this ability relying solely on the temporal coherence principle, the notion that perceived sources emit coherently modulated features that evoke coincident cortical response patterns. It further demonstrates how basic cortical mechanisms common to all sensory systems can implement the necessary representations, as well as the adaptive computations necessary to maintain continuity by tracking slowly changing characteristics of different sources in a scene.
Collapse
Affiliation(s)
- Lakshmi Krishnan
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Shihab Shamma
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Department Etudes Cognitive, Ecole Normale Superieure, Paris, France
| |
Collapse
|
10
|
May T, Dau T. Computational speech segregation based on an auditory-inspired modulation analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:3350. [PMID: 25480079 DOI: 10.1121/1.4901711] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
A monaural speech segregation system is presented that estimates the ideal binary mask from noisy speech based on the supervised learning of amplitude modulation spectrogram (AMS) features. Instead of using linearly scaled modulation filters with constant absolute bandwidth, an auditory-inspired modulation filterbank with logarithmically scaled filters is employed. To reduce the dependency of the AMS features on the overall background noise level, a feature normalization stage is applied. In addition, a spectro-temporal integration stage is incorporated in order to exploit the context information about speech activity present in neighboring time-frequency units. In order to evaluate the generalization performance of the system to unseen acoustic conditions, the speech segregation system is trained with a limited set of low signal-to-noise ratio (SNR) conditions, but tested over a wide range of SNRs up to 20 dB. A systematic evaluation of the system demonstrates that auditory-inspired modulation processing can substantially improve the mask estimation accuracy in the presence of stationary and fluctuating interferers.
Collapse
Affiliation(s)
- Tobias May
- Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Torsten Dau
- Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| |
Collapse
|
11
|
Christiansen SK, Oxenham AJ. Assessing the effects of temporal coherence on auditory stream formation through comodulation masking release. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:3520-3529. [PMID: 24907815 PMCID: PMC4048442 DOI: 10.1121/1.4872300] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Revised: 03/29/2014] [Accepted: 04/07/2014] [Indexed: 05/29/2023]
Abstract
Recent studies of auditory streaming have suggested that repeated synchronous onsets and offsets over time, referred to as "temporal coherence," provide a strong grouping cue between acoustic components, even when they are spectrally remote. This study uses a measure of auditory stream formation, based on comodulation masking release (CMR), to assess the conditions under which a loss of temporal coherence across frequency can lead to auditory stream segregation. The measure relies on the assumption that the CMR, produced by flanking bands remote from the masker and target frequency, only occurs if the masking and flanking bands form part of the same perceptual stream. The masking and flanking bands consisted of sequences of narrowband noise bursts, and the temporal coherence between the masking and flanking bursts was manipulated in two ways: (a) By introducing a fixed temporal offset between the flanking and masking bands that varied from zero to 60 ms and (b) by presenting the flanking and masking bursts at different temporal rates, so that the asynchronies varied from burst to burst. The results showed reduced CMR in all conditions where the flanking and masking bands were temporally incoherent, in line with expectations of the temporal coherence hypothesis.
Collapse
Affiliation(s)
| | - Andrew J Oxenham
- Departments of Psychology and Otolaryngology, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|