1
|
Zaar J, Simonsen LB, Laugesen S. A spectro-temporal modulation test for predicting speech reception in hearing-impaired listeners with hearing aids. Hear Res 2024; 443:108949. [PMID: 38281473 DOI: 10.1016/j.heares.2024.108949] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 12/15/2023] [Accepted: 01/03/2024] [Indexed: 01/30/2024]
Abstract
Spectro-temporal modulation (STM) detection sensitivity has been shown to be associated with speech-in-noise reception in hearing-impaired (HI) individuals. Based on previous research, a recent study [Zaar, Simonsen, Dau, and Laugesen (2023). Hear Res 427:108650] introduced an STM test paradigm with audibility compensation, employing STM stimulus variants using noise and complex tones as carrier signals. The study demonstrated that the test was suitable for the target population of elderly individuals with moderate-to-severe hearing loss and showed promising predictions of speech-reception thresholds (SRTs) measured in a realistic set up with spatially distributed speech and noise maskers and linear audibility compensation. The present study further investigated the suggested STM test with respect to (i) test-retest variability for the most promising STM stimulus variants, (ii) its predictive power with respect to realistic speech-in-noise reception with non-linear hearing-aid amplification, (iii) its connection to effects of directionality and noise reduction (DIR+NR) hearing-aid processing, and (iv) its relation to DIR+NR preference. Thirty elderly HI participants were tested in a combined laboratory and field study, collecting STM thresholds with a complex-tone based and a noise-based STM stimulus design, SRTs with spatially distributed speech and noise maskers using hearing aids with non-linear amplification and two different levels of DIR+NR, as well as subjective reports and preference ratings obtained in two field periods with the two DIR+NR hearing-aid settings. The results indicate that the noise-carrier based STM test variant (i) showed optimal test-retest properties, (ii) yielded a highly significant correlation with SRTs (R2=0.61) exceeding and complementing the predictive power of the audiogram, (iii) yielded significant correlation (R2=0.51) with the DIR+NR-induced SRT benefit, and (iv) did not provide significant correlation with subjective preference for DIR+NR settings in the field. Overall, the suggested STM test represents a valuable tool for diagnosing speech-reception problems that remain when hearing-aid amplification has been provided and the resulting need for and benefit from DIR+NR hearing-aid processing.
Collapse
Affiliation(s)
- Johannes Zaar
- Eriksholm Research Centre, DK-3070 Snekkersten, Denmark; Hearing Systems Section, Department of Health Technology,Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Lisbeth Birkelund Simonsen
- Hearing Systems Section, Department of Health Technology,Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark; Interacoustics Research Unit, DK-2800, Kgs. Lyngby, Denmark
| | - Søren Laugesen
- Interacoustics Research Unit, DK-2800, Kgs. Lyngby, Denmark
| |
Collapse
|
2
|
Ueda K, Hashimoto M, Takeichi H, Wakamiya K. Interrupted mosaic speech revisited: Gain and loss in intelligibility by stretchinga). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1767-1779. [PMID: 38441439 DOI: 10.1121/10.0025132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 02/16/2024] [Indexed: 03/07/2024]
Abstract
Our previous investigation on the effect of stretching spectrotemporally degraded and temporally interrupted speech stimuli showed remarkable intelligibility gains [Udea, Takeichi, and Wakamiya (2022). J. Acoust. Soc. Am. 152(2), 970-980]. In this previous study, however, gap durations and temporal resolution were confounded. In the current investigation, we therefore observed the intelligibility of so-called mosaic speech while dissociating the effects of interruption and temporal resolution. The intelligibility of mosaic speech (20 frequency bands and 20 ms segment duration) declined from 95% to 78% and 33% by interrupting it with 20 and 80 ms gaps. Intelligibility improved, however, to 92% and 54% (14% and 21% gains for 20 and 80 ms gaps, respectively) by stretching mosaic segments to fill silent gaps (n = 21). By contrast, the intelligibility was impoverished to a minimum of 9% (7% loss) when stretching stimuli interrupted with 160 ms gaps. Explanations based on auditory grouping, modulation unmasking, or phonemic restoration may account for the intelligibility improvement by stretching, but not for the loss. The probability summation model accounted for "U"-shaped intelligibility curves and the gain and loss of intelligibility, suggesting that perceptual unit length and speech rate may affect the intelligibility of spectrotemporally degraded speech stimuli.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Acoustic Design, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Masashi Hashimoto
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Wakamiya
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| |
Collapse
|
3
|
Mesiano PA, Zaar J, Bramslw L, Relaño-Iborra H, Dau T. The Role of Average Fundamental Frequency Difference on the Intelligibility of Real-Life Competing Sentences. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023:1-14. [PMID: 37390502 DOI: 10.1044/2023_jslhr-22-00219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2023]
Abstract
PURPOSE The average fundamental frequency separation (∆fo) between two competing voices has been shown to provide an important cue for target-speech intelligibility. However, some of the previous investigations used speech materials with linguistic properties and fo characteristics that may not be typical of realistic acoustic scenarios. This study investigated to what extent the effect of ∆fo generalizes to more real-life speech. METHODS Real-life sentences and a well-controlled method for manipulating the acoustic stimuli were employed. Fifteen young normal-hearing native Danish listeners were tested in a two-competing-voices sentence recognition task at several target-to-masker ratios (TMRs) and ∆fos. RESULTS Compared to previous studies that addressed the same experimental scenario with less realistic speech materials, the present results showed only a moderate effect of ∆fo at negative TMRs and a negligible effect at positive TMRs. An analysis of the employed stimuli showed that a large ∆fo effect on the target speech intelligibility is only observed when the competing sentences have highly synchronous fo trajectories, which is typical of the artificial speech materials employed in previous studies. CONCLUSION Overall, the present results suggest a relatively small effect of ∆fo on the intelligibility of real-life speech, as compared to previously employed artificial speech, in two-competing-sentences conditions.
Collapse
Affiliation(s)
- Paolo A Mesiano
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby
| | - Johannes Zaar
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby
- Eriksholm Research Centre, Helsingør, Denmark
| | | | - Helia Relaño-Iborra
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby
| | - Torsten Dau
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby
| |
Collapse
|
4
|
Windle R, Dillon H, Heinrich A. A review of auditory processing and cognitive change during normal ageing, and the implications for setting hearing aids for older adults. Front Neurol 2023; 14:1122420. [PMID: 37409017 PMCID: PMC10318159 DOI: 10.3389/fneur.2023.1122420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 06/02/2023] [Indexed: 07/07/2023] Open
Abstract
Throughout our adult lives there is a decline in peripheral hearing, auditory processing and elements of cognition that support listening ability. Audiometry provides no information about the status of auditory processing and cognition, and older adults often struggle with complex listening situations, such as speech in noise perception, even if their peripheral hearing appears normal. Hearing aids can address some aspects of peripheral hearing impairment and improve signal-to-noise ratios. However, they cannot directly enhance central processes and may introduce distortion to sound that might act to undermine listening ability. This review paper highlights the need to consider the distortion introduced by hearing aids, specifically when considering normally-ageing older adults. We focus on patients with age-related hearing loss because they represent the vast majority of the population attending audiology clinics. We believe that it is important to recognize that the combination of peripheral and central, auditory and cognitive decline make older adults some of the most complex patients seen in audiology services, so they should not be treated as "standard" despite the high prevalence of age-related hearing loss. We argue that a primary concern should be to avoid hearing aid settings that introduce distortion to speech envelope cues, which is not a new concept. The primary cause of distortion is the speed and range of change to hearing aid amplification (i.e., compression). We argue that slow-acting compression should be considered as a default for some users and that other advanced features should be reconsidered as they may also introduce distortion that some users may not be able to tolerate. We discuss how this can be incorporated into a pragmatic approach to hearing aid fitting that does not require increased loading on audiology services.
Collapse
Affiliation(s)
- Richard Windle
- Audiology Department, Royal Berkshire NHS Foundation Trust, Reading, United Kingdom
| | - Harvey Dillon
- NIHR Manchester Biomedical Research Centre, Manchester, United Kingdom
- Department of Linguistics, Macquarie University, North Ryde, NSW, Australia
| | - Antje Heinrich
- NIHR Manchester Biomedical Research Centre, Manchester, United Kingdom
- Division of Human Communication, Development and Hearing, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
5
|
Shen Y, Langley L. Spectral weighting for sentence recognition in steady-state and amplitude-modulated noise. JASA EXPRESS LETTERS 2023; 3:2887651. [PMID: 37125871 PMCID: PMC10155216 DOI: 10.1121/10.0017934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Accepted: 04/06/2023] [Indexed: 05/03/2023]
Abstract
Spectral weights in octave-frequency bands from 0.25 to 4 kHz were estimated for speech-in-noise recognition using two sentence materials (i.e., the IEEE and AzBio sentences). The masking noise was either unmodulated or sinusoidally amplitude-modulated at 8 Hz. The estimated spectral weights did not vary significantly across two test sessions and were similar for the two sentence materials. Amplitude-modulating the masker increased the weight at 2 kHz and decreased the weight at 0.25 kHz, which may support an upward shift in spectral weights for temporally fluctuating maskers.
Collapse
Affiliation(s)
- Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105-6246, ,
| | - Lauren Langley
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105-6246, ,
| |
Collapse
|
6
|
Steinmetzger K, Rosen S. No evidence for a benefit from masker harmonicity in the perception of speech in noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1064. [PMID: 36859153 DOI: 10.1121/10.0017065] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 01/10/2023] [Indexed: 06/18/2023]
Abstract
When assessing the intelligibility of speech embedded in background noise, maskers with a harmonic spectral structure have been found to be much less detrimental to performance than noise-based interferers. While spectral "glimpsing" in between the resolved masker harmonics and reduced envelope modulations of harmonic maskers have been shown to contribute, this effect has primarily been attributed to the proposed ability of the auditory system to cancel harmonic maskers from the signal mixture. Here, speech intelligibility in the presence of harmonic and inharmonic maskers with similar spectral glimpsing opportunities and envelope modulation spectra was assessed to test the theory of harmonic cancellation. Speech reception thresholds obtained from normal-hearing listeners revealed no effect of masker harmonicity, neither for maskers with static nor dynamic pitch contours. The results show that harmonicity, or time-domain periodicity, as such, does not aid the segregation of speech and masker. Contrary to what might be assumed, this also implies that the saliency of the masker pitch did not affect auditory grouping. Instead, the current data suggest that the reduced masking effectiveness of harmonic sounds is due to the regular spacing of their spectral components.
Collapse
Affiliation(s)
- Kurt Steinmetzger
- Section of Biomagnetism, Department of Neurology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120 Heidelberg, Germany
| | - Stuart Rosen
- Speech, Hearing and Phonetic Sciences, University College London (UCL), Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| |
Collapse
|
7
|
Zaar J, Simonsen LB, Dau T, Laugesen S. Toward a clinically viable spectro-temporal modulation test for predicting supra-threshold speech reception in hearing-impaired listeners. Hear Res 2023; 427:108650. [PMID: 36463632 DOI: 10.1016/j.heares.2022.108650] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 11/05/2022] [Accepted: 11/12/2022] [Indexed: 11/23/2022]
Abstract
The ability of hearing-impaired listeners to detect spectro-temporal modulation (STM) has been shown to correlate with individual listeners' speech reception performance. However, the STM detection tests used in previous studies were overly challenging especially for elderly listeners with moderate-to-severe hearing loss. Furthermore, the speech tests considered as a reference were not optimized to yield ecologically valid outcomes that represent real-life speech reception deficits. The present study investigated an STM detection measurement paradigm with individualized audibility compensation, focusing on its clinical viability and relevance as a real-life supra-threshold speech intelligibility predictor. STM thresholds were measured in 13 elderly hearing-impaired native Danish listeners using four previously established (noise-carrier based) and two novel complex-tone carrier based STM stimulus variants. Speech reception thresholds (SRTs) were measured (i) in a realistic spatial speech-on-speech set up and (ii) using co-located stationary noise, both with individualized amplification. In contrast with previous related studies, the proposed measurement paradigm yielded robust STM thresholds for all listeners and conditions. The STM thresholds were positively correlated with the SRTs, whereby significant correlations were found for the realistic speech-test condition but not for the stationary-noise condition. Three STM stimulus variants (one noise-carrier based and two complex-tone based) yielded significant predictions of SRTs, accounting for up to 53% of the SRT variance. The results of the study could form the basis for a clinically viable STM test for quantifying supra-threshold speech reception deficits in aided hearing-impaired listeners.
Collapse
Affiliation(s)
- Johannes Zaar
- Eriksholm Research Centre, DK-3070 Snekkersten, Denmark; Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | | | - Torsten Dau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Søren Laugesen
- Interacoustics Research Unit, DK-2800, Kgs. Lyngby, Denmark
| |
Collapse
|
8
|
Prud'homme L, Lavandier M, Best V. Investigating the role of harmonic cancellation in speech-on-speech masking. Hear Res 2022; 426:108562. [PMID: 35768309 PMCID: PMC9722527 DOI: 10.1016/j.heares.2022.108562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/26/2022] [Accepted: 06/15/2022] [Indexed: 11/30/2022]
Abstract
This study investigated the role of harmonic cancellation in the intelligibility of speech in "cocktail party" situations. While there is evidence that harmonic cancellation plays a role in the segregation of simple harmonic sounds based on fundamental frequency (F0), its utility for mixtures of speech containing non-stationary F0s and unvoiced segments is unclear. Here we focused on the energetic masking of speech targets caused by competing speech maskers. Speech reception thresholds were measured using seven maskers: speech-shaped noise, monotonized and intonated harmonic complexes, monotonized speech, noise-vocoded speech, reversed speech and natural speech. These maskers enabled an estimate of how the masking potential of speech is influenced by harmonic structure, amplitude modulation and variations in F0 over time. Measured speech reception thresholds were compared to the predictions of two computational models, with and without a harmonic cancellation component. Overall, the results suggest a minor role of harmonic cancellation in reducing energetic masking in speech mixtures.
Collapse
Affiliation(s)
- Luna Prud'homme
- Univ Lyon, ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France
| | - Mathieu Lavandier
- Univ Lyon, ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France.
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| |
Collapse
|
9
|
Prud'homme L, Lavandier M, Best V. A dynamic binaural harmonic-cancellation model to predict speech intelligibility against a harmonic masker varying in intonation, temporal envelope, and location. Hear Res 2022; 426:108535. [PMID: 35654633 PMCID: PMC9684346 DOI: 10.1016/j.heares.2022.108535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/26/2022] [Accepted: 05/23/2022] [Indexed: 11/28/2022]
Abstract
The aim of this study was to extend the harmonic-cancellation model proposed by Prud'homme et al. [J. Acoust. Soc. Am. 148 (2020) 3246--3254] to predict speech intelligibility against a harmonic masker, so that it takes into account binaural hearing, amplitude modulations in the masker and variations in masker fundamental frequency (F0) over time. This was done by segmenting the masker signal into time frames and combining the previous long-term harmonic-cancellation model with the binaural model proposed by Vicente and Lavandier [Hear. Res. 390 (2020) 107937]. The new model was tested on the data from two experiments involving harmonic complex maskers that varied in spatial location, temporal envelope and F0 contour. The interactions between the associated effects were accounted for in the model by varying the time frame duration and excluding the binaural unmasking computation when harmonic cancellation is active. Across both experiments, the correlation between data and model predictions was over 0.96, and the mean and largest absolute prediction errors were lower than 0.6 and 1.5 dB, respectively.
Collapse
Affiliation(s)
- Luna Prud'homme
- ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, University Lyon, Vaulx-en-Velin 69518, France
| | - Mathieu Lavandier
- ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, University Lyon, Vaulx-en-Velin 69518, France.
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA 02215, USA
| |
Collapse
|
10
|
Relaño-Iborra H, Dau T. Speech intelligibility prediction based on modulation frequency-selective processing. Hear Res 2022; 426:108610. [PMID: 36163219 DOI: 10.1016/j.heares.2022.108610] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 08/22/2022] [Accepted: 09/12/2022] [Indexed: 11/17/2022]
Abstract
Speech intelligibility models can provide insights regarding the auditory processes involved in human speech perception and communication. One successful approach to modelling speech intelligibility has been based on the analysis of the amplitude modulations present in speech as well as competing interferers. This review covers speech intelligibility models that include a modulation-frequency selective processing stage i.e., a modulation filterbank, as part of their front end. The speech-based envelope power spectrum model [sEPSM, Jørgensen and Dau (2011). J. Acoust. Soc. Am. 130(3), 1475-1487], several variants of the sEPSM including modifications with respect to temporal resolution, spectro-temporal processing and binaural processing, as well as the speech-based computational auditory signal processing and perception model [sCASP; Relaño-Iborra et al. J. Acoust. Soc. Am. 146(5), 3306-3317], which is based on an established auditory signal detection and masking model, are discussed. The key processing stages of these models for the prediction of speech intelligibility across a variety of acoustic conditions are addressed in relation to competing modeling approaches. The strengths and weaknesses of the modulation-based analysis are outlined and perspectives presented, particularly in connection with the challenge of predicting the consequences of individual hearing loss on speech intelligibility.
Collapse
Affiliation(s)
- Helia Relaño-Iborra
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kgs. Lyngby 2800, Denmark; Cognitive Systems Section, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs, Lyngby 2800, Denmark.
| | - Torsten Dau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kgs. Lyngby 2800, Denmark
| |
Collapse
|
11
|
Buss E, Miller MK, Leibold LJ. Maturation of Speech-in-Speech Recognition for Whispered and Voiced Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3117-3128. [PMID: 35868232 PMCID: PMC9911131 DOI: 10.1044/2022_jslhr-21-00620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 04/01/2022] [Accepted: 04/29/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE Some speech recognition data suggest that children rely less on voice pitch and harmonicity to support auditory scene analysis than adults. Two experiments evaluated development of speech-in-speech recognition using voiced speech and whispered speech, which lacks the harmonic structure of voiced speech. METHOD Listeners were 5- to 7-year-olds and adults with normal hearing. Targets were monosyllabic words organized into three-word sets that differ in vowel content. Maskers were two-talker or one-talker streams of speech. Targets and maskers were recorded by different female talkers in both voiced and whispered speaking styles. For each masker, speech reception thresholds (SRTs) were measured in all four combinations of target and masker speech, including matched and mismatched speaking styles for the target and masker. RESULTS Children performed more poorly than adults overall. For the two-talker masker, this age effect was smaller for the whispered target and masker than for the other three conditions. Children's SRTs in this condition were predominantly positive, suggesting that they may have relied on a wholistic listening strategy rather than segregating the target from the masker. For the one-talker masker, age effects were consistent across the four conditions. Reduced informational masking for the one-talker masker could be responsible for differences in age effects for the two maskers. A benefit of mismatching the target and masker speaking style was observed for both target styles in the two-talker masker and for the voiced targets in the one-talker masker. CONCLUSIONS These results provide no compelling evidence that young school-age children and adults are differentially sensitive to the cues present in voiced and whispered speech. Both groups benefit from mismatches in speaking style under some conditions. These benefits could be due to a combination of reduced perceptual similarity, harmonic cancelation, and differences in energetic masking.
Collapse
Affiliation(s)
- Emily Buss
- Department of Otolaryngology-Head and Neck Surgery, University of North Carolina at Chapel Hill
| | - Margaret K. Miller
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| | - Lori J. Leibold
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| |
Collapse
|
12
|
Abstract
Hearing in noise is a core problem in audition, and a challenge for hearing-impaired listeners, yet the underlying mechanisms are poorly understood. We explored whether harmonic frequency relations, a signature property of many communication sounds, aid hearing in noise for normal hearing listeners. We measured detection thresholds in noise for tones and speech synthesized to have harmonic or inharmonic spectra. Harmonic signals were consistently easier to detect than otherwise identical inharmonic signals. Harmonicity also improved discrimination of sounds in noise. The largest benefits were observed for two-note up-down "pitch" discrimination and melodic contour discrimination, both of which could be performed equally well with harmonic and inharmonic tones in quiet, but which showed large harmonic advantages in noise. The results show that harmonicity facilitates hearing in noise, plausibly by providing a noise-robust pitch cue that aids detection and discrimination.
Collapse
|
13
|
Cortical activity evoked by voice pitch changes: a combined fNIRS and EEG study. Hear Res 2022; 420:108483. [DOI: 10.1016/j.heares.2022.108483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 03/02/2022] [Accepted: 03/10/2022] [Indexed: 11/22/2022]
|
14
|
Hernández-Pérez H, Mikiel-Hunter J, McAlpine D, Dhar S, Boothalingam S, Monaghan JJM, McMahon CM. Understanding degraded speech leads to perceptual gating of a brainstem reflex in human listeners. PLoS Biol 2021; 19:e3001439. [PMID: 34669696 PMCID: PMC8559948 DOI: 10.1371/journal.pbio.3001439] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 11/01/2021] [Accepted: 10/07/2021] [Indexed: 11/19/2022] Open
Abstract
The ability to navigate "cocktail party" situations by focusing on sounds of interest over irrelevant, background sounds is often considered in terms of cortical mechanisms. However, subcortical circuits such as the pathway underlying the medial olivocochlear (MOC) reflex modulate the activity of the inner ear itself, supporting the extraction of salient features from auditory scene prior to any cortical processing. To understand the contribution of auditory subcortical nuclei and the cochlea in complex listening tasks, we made physiological recordings along the auditory pathway while listeners engaged in detecting non(sense) words in lists of words. Both naturally spoken and intrinsically noisy, vocoded speech-filtering that mimics processing by a cochlear implant (CI)-significantly activated the MOC reflex, but this was not the case for speech in background noise, which more engaged midbrain and cortical resources. A model of the initial stages of auditory processing reproduced specific effects of each form of speech degradation, providing a rationale for goal-directed gating of the MOC reflex based on enhancing the representation of the energy envelope of the acoustic waveform. Our data reveal the coexistence of 2 strategies in the auditory system that may facilitate speech understanding in situations where the signal is either intrinsically degraded or masked by extrinsic acoustic energy. Whereas intrinsically degraded streams recruit the MOC reflex to improve representation of speech cues peripherally, extrinsically masked streams rely more on higher auditory centres to denoise signals.
Collapse
Affiliation(s)
- Heivet Hernández-Pérez
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - Jason Mikiel-Hunter
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - David McAlpine
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - Sumitrajit Dhar
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, Illinois, United States of America
| | - Sriram Boothalingam
- University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Jessica J. M. Monaghan
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
- National Acoustic Laboratories, Sydney, Australia
| | - Catherine M. McMahon
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| |
Collapse
|
15
|
Homma NY, Bajo VM. Lemniscal Corticothalamic Feedback in Auditory Scene Analysis. Front Neurosci 2021; 15:723893. [PMID: 34489635 PMCID: PMC8417129 DOI: 10.3389/fnins.2021.723893] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/30/2021] [Indexed: 12/15/2022] Open
Abstract
Sound information is transmitted from the ear to central auditory stations of the brain via several nuclei. In addition to these ascending pathways there exist descending projections that can influence the information processing at each of these nuclei. A major descending pathway in the auditory system is the feedback projection from layer VI of the primary auditory cortex (A1) to the ventral division of medial geniculate body (MGBv) in the thalamus. The corticothalamic axons have small glutamatergic terminals that can modulate thalamic processing and thalamocortical information transmission. Corticothalamic neurons also provide input to GABAergic neurons of the thalamic reticular nucleus (TRN) that receives collaterals from the ascending thalamic axons. The balance of corticothalamic and TRN inputs has been shown to refine frequency tuning, firing patterns, and gating of MGBv neurons. Therefore, the thalamus is not merely a relay stage in the chain of auditory nuclei but does participate in complex aspects of sound processing that include top-down modulations. In this review, we aim (i) to examine how lemniscal corticothalamic feedback modulates responses in MGBv neurons, and (ii) to explore how the feedback contributes to auditory scene analysis, particularly on frequency and harmonic perception. Finally, we will discuss potential implications of the role of corticothalamic feedback in music and speech perception, where precise spectral and temporal processing is essential.
Collapse
Affiliation(s)
- Natsumi Y. Homma
- Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA, United States
- Coleman Memorial Laboratory, Department of Otolaryngology – Head and Neck Surgery, University of California, San Francisco, San Francisco, CA, United States
| | - Victoria M. Bajo
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
16
|
de Cheveigné A. Harmonic Cancellation-A Fundamental of Auditory Scene Analysis. Trends Hear 2021; 25:23312165211041422. [PMID: 34698574 PMCID: PMC8552394 DOI: 10.1177/23312165211041422] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/23/2021] [Accepted: 07/09/2021] [Indexed: 11/16/2022] Open
Abstract
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des systèmes perceptifs, CNRS, Paris, France
- Département d’études cognitives, École normale supérieure, PSL
University, Paris, France
- UCL Ear Institute, London, UK
| |
Collapse
|
17
|
Prud'homme L, Lavandier M, Best V. A harmonic-cancellation-based model to predict speech intelligibility against a harmonic masker. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:3246. [PMID: 33261378 PMCID: PMC8097714 DOI: 10.1121/10.0002492] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 10/08/2020] [Accepted: 10/19/2020] [Indexed: 05/20/2023]
Abstract
This work aims to predict speech intelligibility against harmonic maskers. Unlike noise maskers, harmonic maskers (including speech) have a harmonic structure that may allow for a release from masking based on fundamental frequency (F0). Mechanisms, such as spectral glimpsing and harmonic cancellation, have been proposed to explain F0 segregation, but their relative contributions and ability to predict behavioral data have not been explored. A speech intelligibility model was developed that includes both spectral glimpsing and harmonic cancellation. The model was used to fit the data of two experiments from Deroche, Culling, Chatterjee, and Limb [J. Acoust. Soc. Am. 135, 2873-2884 (2014)], in which speech reception thresholds were measured for stationary harmonic maskers varying in their F0 and degree of harmonicity. Key model parameters (jitter in the masker F0, shape of the cancellation filter, frequency limit for cancellation, and signal-to-noise ratio ceiling) were optimized by maximizing the correspondence between the predictions and data. The model was able to accurately describe the effects associated with varying the masker F0 and harmonicity. Across both experiments, the correlation between data and predictions was 0.99, and the mean and largest absolute prediction errors were lower than 0.5 and 1 dB, respectively.
Collapse
Affiliation(s)
- Luna Prud'homme
- Univ Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, 69518 Vaulx-en-Velin, France
| | - Mathieu Lavandier
- Univ Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, 69518 Vaulx-en-Velin, France
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
18
|
Steinmetzger K, Shen Z, Riedel H, Rupp A. Auditory cortex activity measured using functional near-infrared spectroscopy (fNIRS) appears to be susceptible to masking by cortical blood stealing. Hear Res 2020; 396:108069. [DOI: 10.1016/j.heares.2020.108069] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 08/24/2020] [Accepted: 08/31/2020] [Indexed: 01/21/2023]
|
19
|
Steinmetzger K, Zaar J, Relaño-Iborra H, Rosen S, Dau T. Predicting the effects of periodicity on the intelligibility of masked speech: An evaluation of different modelling approaches and their limitations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:2562. [PMID: 31671986 DOI: 10.1121/1.5129050] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 09/20/2019] [Indexed: 06/10/2023]
Abstract
Four existing speech intelligibility models with different theoretical assumptions were used to predict previously published behavioural data. Those data showed that complex tones with pitch-related periodicity are far less effective maskers of speech than aperiodic noise. This so-called masker-periodicity benefit (MPB) far exceeded the fluctuating-masker benefit (FMB) obtained from slow masker envelope fluctuations. In contrast, the normal-hearing listeners hardly benefitted from periodicity in the target speech. All tested models consistently underestimated MPB and FMB, while most of them also overestimated the intelligibility of vocoded speech. To understand these shortcomings, the internal signal representations of the models were analysed in detail. The best-performing model, the correlation-based version of the speech-based envelope power spectrum model (sEPSMcorr), combined an auditory processing front end with a modulation filterbank and a correlation-based back end. This model was then modified to further improve the predictions. The resulting second version of the sEPSMcorr outperformed the original model with all tested maskers and accounted for about half the MPB, which can be attributed to reduced modulation masking caused by the periodic maskers. However, as the sEPSMcorr2 failed to account for the other half of the MPB, the results also indicate that future models should consider the contribution of pitch-related effects, such as enhanced stream segregation, to further improve their predictive power.
Collapse
Affiliation(s)
- Kurt Steinmetzger
- Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London WC1N 1PF, United Kingdom
| | - Johannes Zaar
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Helia Relaño-Iborra
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Stuart Rosen
- Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London WC1N 1PF, United Kingdom
| | - Torsten Dau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| |
Collapse
|
20
|
Biberger T, Ewert SD. The effect of room acoustical parameters on speech reception thresholds and spatial release from masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:2188. [PMID: 31671969 DOI: 10.1121/1.5126694] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 08/30/2019] [Indexed: 06/10/2023]
Abstract
In daily life, speech intelligibility is affected by masking caused by interferers and by reverberation. For a frontal target speaker and two interfering sources symmetrically placed to either side, spatial release from masking (SRM) is observed in comparison to frontal interferers. In this case, the auditory system can make use of temporally fluctuating interaural time/phase and level differences promoting binaural unmasking (BU) and better-ear glimpsing (BEG). Reverberation affects the waveforms of the target and maskers, and the interaural differences, depending on the spatial configuration and on the room acoustical properties. In this study, the effect of room acoustics, temporal structure of the interferers, and target-masker positions on speech reception thresholds and SRM was assessed. The results were compared to an optimal better-ear glimpsing strategy to help disentangle energetic masking including effects of BU and BEG as well as informational masking (IM). In anechoic and moderate reverberant conditions, BU and BEG contributed to SRM of fluctuating speech-like maskers, while BU did not contribute in highly reverberant conditions. In highly reverberant rooms a SRM of up to 3 dB was observed for speech maskers, including effects of release from IM based on binaural cues.
Collapse
Affiliation(s)
- Thomas Biberger
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Stephan D Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
21
|
Abstract
OBJECTIVES This study aimed to evaluate the informational component of speech-on-speech masking. Speech perception in the presence of a competing talker involves not only informational masking (IM) but also a number of masking processes involving interaction of masker and target energy in the auditory periphery. Such peripherally generated masking can be eliminated by presenting the target and masker in opposite ears (dichotically). However, this also reduces IM by providing listeners with lateralization cues that support spatial release from masking (SRM). In tonal sequences, IM can be isolated by rapidly switching the lateralization of dichotic target and masker streams across the ears, presumably producing ambiguous spatial percepts that interfere with SRM. However, it is not clear whether this technique works with speech materials. DESIGN Speech reception thresholds (SRTs) were measured in 17 young normal-hearing adults for sentences produced by a female talker in the presence of a competing male talker under three different conditions: diotic (target and masker in both ears), dichotic, and dichotic but switching the target and masker streams across the ears. Because switching rate and signal coherence were expected to influence the amount of IM observed, these two factors varied across conditions. When switches occurred, they were either at word boundaries or periodically (every 116 msec) and either with or without a brief gap (84 msec) at every switch point. In addition, SRTs were measured in a quiet condition to rule out audibility as a limiting factor. RESULTS SRTs were poorer for the four switching dichotic conditions than for the nonswitching dichotic condition, but better than for the diotic condition. Periodic switches without gaps resulted in the worst SRTs compared to the other switch conditions, thus maximizing IM. CONCLUSIONS These findings suggest that periodically switching the target and masker streams across the ears (without gaps) was the most efficient in disrupting SRM. Thus, this approach can be used in experiments that seek a relatively pure measure of IM, and could be readily extended to translational research.
Collapse
|
22
|
Shen Y, Pearson DV. Efficiency in glimpsing vowel sequences in fluctuating makers: Effects of temporal fine structure and temporal regularity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:2518. [PMID: 31046353 PMCID: PMC6491349 DOI: 10.1121/1.5098949] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Listeners' efficiency in glimpsing the target speech in amplitude-modulated maskers may depend on whether the target is perceptually segregated from the masker and on the temporal predictability of the target. Using synthesized vowel sequences as the target, recognition of vowel sequences in simultaneous amplitude-modulated noise maskers was measured as the signal-to-noise ratio (SNR) and the masker modulation rate were systematically varied. In Experiment I (Exp. I), the temporal fine structure of the target was degraded by synthesizing the vowels using iterated rippled noise as the glottal source. In Experiment II (Exp. II), the vowel sequences were constructed so that they were not isochronous, but instead contained randomized intervals between adjacent vowels. Results were compared to the predictions from a dip-listening model based on short-term SNR. The results show no significant facilitative effect of temporal fine structure cues on vowel recognition (Exp. I). The model predictions significantly overestimated vowel-recognition performance in amplitude-modulated maskers when the temporal regularity of the target was degraded (Exp. II), suggesting the influence of temporal regularity on glimpsing efficiency. Furthermore, the overestimations by the model were greater at lower SNRs and selective to moderate masker modulation rates (between 2 and 6 Hz).
Collapse
Affiliation(s)
- Yi Shen
- Department of Speech and Hearing Sciences, Indiana University Bloomington, Bloomington, Indiana 47405, USA
| | - Dylan V Pearson
- Department of Speech and Hearing Sciences, Indiana University Bloomington, Bloomington, Indiana 47405, USA
| |
Collapse
|
23
|
Training of Speech Perception in Noise in Pre-Lingual Hearing Impaired Adults With Cochlear Implants Compared With Normal Hearing Adults. Otol Neurotol 2019; 40:e316-e325. [DOI: 10.1097/mao.0000000000002128] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
24
|
Bologna WJ, Vaden KI, Ahlstrom JB, Dubno JR. Age effects on the contributions of envelope and periodicity cues to recognition of interrupted speech in quiet and with a competing talker. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:EL173. [PMID: 31067962 PMCID: PMC7112707 DOI: 10.1121/1.5091664] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Revised: 12/06/2018] [Accepted: 01/25/2019] [Indexed: 06/09/2023]
Abstract
Envelope and periodicity cues may provide redundant, additive, or synergistic benefits to speech recognition. The contributions of these cues may change under different listening conditions and may differ for younger and older adults. To address these questions, younger and older adults with normal hearing listened to interrupted sentences containing different combinations of envelope and periodicity cues in quiet and with a competing talker. Envelope and periodicity cues improved speech recognition for both groups, and their benefits were additive when both cues were available. Envelope cues were particularly important for older adults and for sentences with a competing talker.
Collapse
Affiliation(s)
- William J Bologna
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, , , ,
| | - Kenneth I Vaden
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, , , ,
| | - Jayne B Ahlstrom
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, , , ,
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, , , ,
| |
Collapse
|
25
|
Deroche MLD, Gracco VL. Segregation of voices with single or double fundamental frequencies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:847. [PMID: 30823786 DOI: 10.1121/1.5090107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 01/23/2019] [Indexed: 06/09/2023]
Abstract
In cocktail-party situations, listeners can use the fundamental frequency (F0) of a voice to segregate it from competitors, but other cues in speech could help, such as co-modulation of envelopes across frequency or more complex cues related to the semantic/syntactic content of the utterances. For simplicity, this (non-pitch) form of grouping is referred to as "articulatory." By creating a new type of speech with two steady F0s, it was examined how these two forms of segregation compete: articulatory grouping would bind the partials of a double-F0 source together, whereas harmonic segregation would tend to split them in two subsets. In experiment 1, maskers were two same-male sentences. Speech reception thresholds were high in this task (vicinity of 0 dB), and harmonic segregation behaved as though double-F0 stimuli were two independent sources. This was not the case in experiment 2, where maskers were speech-shaped complexes (buzzes). First, double-F0 targets were immune to the masking of a single-F0 buzz matching one of the two target F0s. Second, double-F0 buzzes were particularly effective at masking a single-F0 target matching one of the two buzz F0s. As a conclusion, the strength of F0-segregation appears to depend on whether the masker is speech or not.
Collapse
Affiliation(s)
- Mickael L D Deroche
- Centre for Research on Brain, Language and Music, McGill University, 3640 rue de la Montagne, Montreal, H3G 2A8, Canada
| | - Vincent L Gracco
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, USA
| |
Collapse
|
26
|
Frequency specificity of amplitude envelope patterns in noise-vocoded speech. Hear Res 2018; 367:169-181. [DOI: 10.1016/j.heares.2018.06.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Revised: 06/03/2018] [Accepted: 06/08/2018] [Indexed: 11/22/2022]
|
27
|
Steinmetzger K, Rosen S. The role of envelope periodicity in the perception of masked speech with simulated and real cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:885. [PMID: 30180719 DOI: 10.1121/1.5049584] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2018] [Accepted: 07/22/2018] [Indexed: 06/08/2023]
Abstract
In normal hearing, complex tones with pitch-related periodic envelope modulations are far less effective maskers of speech than aperiodic noise. Here, it is shown that this masker-periodicity benefit is diminished in noise-vocoder simulations of cochlear implants (CIs) and further reduced with real CIs. Nevertheless, both listener groups still benefitted significantly from masker periodicity, despite the lack of salient spectral pitch cues. The main reason for the smaller effect observed in CI users is thought to be an even stronger channel interaction than in the CI simulations, which smears out the random envelope modulations that are characteristic for aperiodic sounds. In contrast, neither interferers that were amplitude-modulated at a rate of 10 Hz nor maskers with envelopes specifically designed to reveal the target speech enabled a masking release in CI users. Hence, even at the high signal-to-noise ratios at which they were tested, CI users can still exploit pitch cues transmitted by the temporal envelope of a non-speech masker, whereas slow amplitude modulations of the masker envelope are no longer helpful.
Collapse
Affiliation(s)
- Kurt Steinmetzger
- Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| | - Stuart Rosen
- Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| |
Collapse
|
28
|
Popham S, Boebinger D, Ellis DPW, Kawahara H, McDermott JH. Inharmonic speech reveals the role of harmonicity in the cocktail party problem. Nat Commun 2018; 9:2122. [PMID: 29844313 PMCID: PMC5974276 DOI: 10.1038/s41467-018-04551-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 05/08/2018] [Indexed: 11/22/2022] Open
Abstract
The "cocktail party problem" requires us to discern individual sound sources from mixtures of sources. The brain must use knowledge of natural sound regularities for this purpose. One much-discussed regularity is the tendency for frequencies to be harmonically related (integer multiples of a fundamental frequency). To test the role of harmonicity in real-world sound segregation, we developed speech analysis/synthesis tools to perturb the carrier frequencies of speech, disrupting harmonic frequency relations while maintaining the spectrotemporal envelope that determines phonemic content. We find that violations of harmonicity cause individual frequencies of speech to segregate from each other, impair the intelligibility of concurrent utterances despite leaving intelligibility of single utterances intact, and cause listeners to lose track of target talkers. However, additional segregation deficits result from replacing harmonic frequencies with noise (simulating whispering), suggesting additional grouping cues enabled by voiced speech excitation. Our results demonstrate acoustic grouping cues in real-world sound segregation.
Collapse
Affiliation(s)
- Sara Popham
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA
- Helen Wills Neuroscience Institute, UC Berkeley, Berkeley, CA, 94720, USA
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA
- Program in Speech and Hearing Sciences, Harvard University, Cambridge, MA, 02138, USA
| | | | | | - Josh H McDermott
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA.
- Program in Speech and Hearing Sciences, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
29
|
Nagaraj NK, Magimairaj BM. Role of working memory and lexical knowledge in perceptual restoration of interrupted speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:3756. [PMID: 29289104 DOI: 10.1121/1.5018429] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The role of working memory (WM) capacity and lexical knowledge in perceptual restoration (PR) of missing speech was investigated using the interrupted speech perception paradigm. Speech identification ability, which indexed PR, was measured using low-context sentences periodically interrupted at 1.5 Hz. PR was measured for silent gated, low-frequency speech noise filled, and low-frequency fine-structure and envelope filled interrupted conditions. WM capacity was measured using verbal and visuospatial span tasks. Lexical knowledge was assessed using both receptive vocabulary and meaning from context tests. Results showed that PR was better for speech noise filled condition than other conditions tested. Both receptive vocabulary and verbal WM capacity explained unique variance in PR for the speech noise filled condition, but were unrelated to performance in the silent gated condition. It was only receptive vocabulary that uniquely predicted PR for fine-structure and envelope filled conditions. These findings suggest that the contribution of lexical knowledge and verbal WM during PR depends crucially on the information content that replaced the silent intervals. When perceptual continuity was partially restored by filler speech noise, both lexical knowledge and verbal WM capacity facilitated PR. Importantly, for fine-structure and envelope filled interrupted conditions, lexical knowledge was crucial for PR.
Collapse
Affiliation(s)
- Naveen K Nagaraj
- Cognitive Hearing Science Lab, University of Arkansas for Medical Sciences and University of Arkansas at Little Rock, Little Rock, Arkansas 72204, USA
| | - Beula M Magimairaj
- Cognition and Language Lab, Communication Sciences and Disorders, University of Central Arkansas, Conway, Arkansas 72035, USA
| |
Collapse
|
30
|
Xu Y, Chen M, LaFaire P, Tan X, Richter CP. Distorting temporal fine structure by phase shifting and its effects on speech intelligibility and neural phase locking. Sci Rep 2017; 7:13387. [PMID: 29042580 PMCID: PMC5645416 DOI: 10.1038/s41598-017-12975-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 09/13/2017] [Indexed: 11/27/2022] Open
Abstract
Envelope (E) and temporal fine structure (TFS) are important features of acoustic signals and their corresponding perceptual function has been investigated with various listening tasks. To further understand the underlying neural processing of TFS, experiments in humans and animals were conducted to demonstrate the effects of modifying the TFS in natural speech sentences on both speech recognition and neural coding. The TFS of natural speech sentences was modified by distorting the phase and maintaining the magnitude. Speech intelligibility was then tested for normal-hearing listeners using the intact and reconstructed sentences presented in quiet and against background noise. Sentences with modified TFS were then used to evoke neural activity in auditory neurons of the inferior colliculus in guinea pigs. Our study demonstrated that speech intelligibility in humans relied on the periodic cues of speech TFS in both quiet and noisy listening conditions. Furthermore, recordings of neural activity from the guinea pig inferior colliculus have shown that individual auditory neurons exhibit phase locking patterns to the periodic cues of speech TFS that disappear when reconstructed sounds do not show periodic patterns anymore. Thus, the periodic cues of TFS are essential for speech intelligibility and are encoded in auditory neurons by phase locking.
Collapse
Affiliation(s)
- Yingyue Xu
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Maxin Chen
- Northwestern University, Department of Biomedical Engineering, 2145 Sheridan Road, Tech E310, Evanston, IL, 60208, USA
| | - Petrina LaFaire
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Xiaodong Tan
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Claus-Peter Richter
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA. .,Northwestern University, The Hugh Knowles Center, Department of Communication Sciences and Disorders, 2240 Campus Drive, Evanston, IL, 60208, USA.
| |
Collapse
|
31
|
Leclère T, Lavandier M, Deroche MLD. The intelligibility of speech in a harmonic masker varying in fundamental frequency contour, broadband temporal envelope, and spatial location. Hear Res 2017; 350:1-10. [PMID: 28390253 DOI: 10.1016/j.heares.2017.03.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 03/21/2017] [Accepted: 03/26/2017] [Indexed: 10/19/2022]
Abstract
Differences in fundamental frequency (F0), modulations in the masker envelope, and differences in spatial location between a speech target and a masker can improve speech intelligibility in cocktail-party situations. These cues have been thoroughly investigated independently and associated with unmasking mechanisms: F0 segregation, temporal dip listening and spatial unmasking, respectively. Two experiments were conducted to examine whether F0 segregation interacts with spatial unmasking (experiment 1) or temporal modulations in the masker envelope (experiment 2) by measuring speech reception thresholds for a monotonized or an intonated voice against eight types of harmonic complex masker. In experiment 1, the masker varied in F0 contour (monotonized or intonated), mean F0 (0 or 3 semitones above that of the target) and spatial location (co-located or separated from the target). In experiment 2, the masker varied in F0 contour, mean F0 and broadband temporal envelope (stationary or 1-voice modulated). The benefits associated with spatial separation and F0 differences added up linearly in almost all conditions, whereas modulations in the masker envelope improved speech intelligibility only in the presence of intonated maskers. In addition, in both experiments F0 segregation benefited considerably from natural variations in the F0 pattern of the target voice, but was largely disrupted by those of the masker.
Collapse
Affiliation(s)
- Thibaud Leclère
- Univ Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, F-69518, Vaulx-en-Velin Cedex, France
| | - Mathieu Lavandier
- Univ Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, F-69518, Vaulx-en-Velin Cedex, France.
| | - Mickael L D Deroche
- Center for Research on Brain, Language and Music, McGill University, Rabinovitch House, 3640 rue de la Montagne, Montreal, Quebec H3G 2A8, Canada
| |
Collapse
|
32
|
Steinmetzger K, Rosen S. Effects of acoustic periodicity and intelligibility on the neural oscillations in response to speech. Neuropsychologia 2017; 95:173-181. [DOI: 10.1016/j.neuropsychologia.2016.12.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Revised: 09/07/2016] [Accepted: 12/05/2016] [Indexed: 10/20/2022]
|
33
|
Steinmetzger K, Rosen S. Effects of acoustic periodicity, intelligibility, and pre-stimulus alpha power on the event-related potentials in response to speech. BRAIN AND LANGUAGE 2017; 164:1-8. [PMID: 27690124 DOI: 10.1016/j.bandl.2016.09.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 08/04/2016] [Accepted: 09/19/2016] [Indexed: 06/06/2023]
Abstract
Magneto- and electroencephalographic (M/EEG) signals in response to acoustically degraded speech have been examined by several recent studies. Unambiguously interpreting the results is complicated by the fact that speech signal manipulations affect acoustics and intelligibility alike. In the current EEG study, the acoustic properties of the stimuli were altered and the trials were sorted according to the correctness of the listeners' spoken responses to separate out these two factors. Firstly, more periodicity (i.e. voicing) rendered the event-related potentials (ERPs) more negative during the first second after sentence onset, indicating a greater cortical sensitivity to auditory input with a pitch. Secondly, we observed a larger contingent negative variation (CNV) during sentence presentation when the subjects could subsequently repeat more words correctly. Additionally, slow alpha power (7-10Hz) before sentences with the least correctly repeated words was increased, which may indicate that subjects have not been focussed on the upcoming task.
Collapse
Affiliation(s)
- Kurt Steinmetzger
- Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London WC1N 1PF, United Kingdom.
| | - Stuart Rosen
- Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London WC1N 1PF, United Kingdom
| |
Collapse
|