1
|
Lie S, Zekveld AA, Smits C, Kramer SE, Versfeld NJ. Learning effects in speech-in-noise tasks: Effect of masker modulation and masking release. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:341-349. [PMID: 38990038 DOI: 10.1121/10.0026519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 06/19/2024] [Indexed: 07/12/2024]
Abstract
Previous research has shown that learning effects are present for speech intelligibility in temporally modulated (TM) noise, but not in stationary noise. The present study aimed to gain more insight into the factors that might affect the time course (the number of trials required to reach stable performance) and size [the improvement in the speech reception threshold (SRT)] of the learning effect. Two hypotheses were addressed: (1) learning effects are present in both TM and spectrally modulated (SM) noise and (2) the time course and size of the learning effect depend on the amount of masking release caused by either TM or SM noise. Eighteen normal-hearing adults (23-62 years) participated in SRT measurements, in which they listened to sentences in six masker conditions, including stationary, TM, and SM noise conditions. The results showed learning effects in all TM and SM noise conditions, but not for the stationary noise condition. The learning effect was related to the size of masking release: a larger masking release was accompanied by an increased time course of the learning effect and a larger learning effect. The results also indicate that speech is processed differently in SM noise than in TM noise.
Collapse
Affiliation(s)
- Sisi Lie
- Amsterdam UMC, Vrije Universiteit Amsterdam, Otolaryngology-Head and Neck Surgery, Ear and Hearing, De Boelelaan, Amsterdam Public Health research institute, Amsterdam, The Netherlands
| | - Adriana A Zekveld
- Amsterdam UMC, Vrije Universiteit Amsterdam, Otolaryngology-Head and Neck Surgery, Ear and Hearing, De Boelelaan, Amsterdam Public Health research institute, Amsterdam, The Netherlands
| | - Cas Smits
- Amsterdam UMC, University of Amsterdam, Otolaryngology-Head and Neck Surgery, Ear and Hearing, Meibergdreef, Amsterdam Public Health research institute, Amsterdam, The Netherlands
| | - Sophia E Kramer
- Amsterdam UMC, Vrije Universiteit Amsterdam, Otolaryngology-Head and Neck Surgery, Ear and Hearing, De Boelelaan, Amsterdam Public Health research institute, Amsterdam, The Netherlands
| | - Niek J Versfeld
- Amsterdam UMC, Vrije Universiteit Amsterdam, Otolaryngology-Head and Neck Surgery, Ear and Hearing, De Boelelaan, Amsterdam Public Health research institute, Amsterdam, The Netherlands
| |
Collapse
|
2
|
Johnson EM, Healy EW. The Optimal Speech-to-Background Ratio for Balancing Speech Recognition With Environmental Sound Recognition. Ear Hear 2024:00003446-990000000-00287. [PMID: 38816900 DOI: 10.1097/aud.0000000000001532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
OBJECTIVES This study aimed to determine the speech-to-background ratios (SBRs) at which normal-hearing (NH) and hearing-impaired (HI) listeners can recognize both speech and environmental sounds when the two types of signals are mixed. Also examined were the effect of individual sounds on speech recognition and environmental sound recognition (ESR), and the impact of divided versus selective attention on these tasks. DESIGN In Experiment 1 (divided attention), 11 NH and 10 HI listeners heard sentences mixed with environmental sounds at various SBRs and performed speech recognition and ESR tasks concurrently in each trial. In Experiment 2 (selective attention), 20 NH listeners performed these tasks in separate trials. Psychometric functions were generated for each task, listener group, and environmental sound. The range over which speech recognition and ESR were both high was determined, as was the optimal SBR for balancing recognition with ESR, defined as the point of intersection between each pair of normalized psychometric functions. RESULTS The NH listeners achieved greater than 95% accuracy on concurrent speech recognition and ESR over an SBR range of approximately 20 dB or greater. The optimal SBR for maximizing both speech recognition and ESR for NH listeners was approximately +12 dB. For the HI listeners, the range over which 95% performance was observed on both tasks was far smaller (span of 1 dB), with an optimal value of +5 dB. Acoustic analyses indicated that the speech and environmental sound stimuli were similarly audible, regardless of the hearing status of the listener, but that the speech fluctuated more than the environmental sounds. Divided versus selective attention conditions produced differences in performance that were statistically significant yet only modest in magnitude. In all conditions and for both listener groups, recognition was higher for environmental sounds than for speech when presented at equal intensities (i.e., 0 dB SBR), indicating that the environmental sounds were more effective maskers of speech than the converse. Each of the 25 environmental sounds used in this study (with one exception) had a span of SBRs over which speech recognition and ESR were both higher than 95%. These ranges tended to overlap substantially. CONCLUSIONS A range of SBRs exists over which speech and environmental sounds can be simultaneously recognized with high accuracy by NH and HI listeners, but this range is larger for NH listeners. The single optimal SBR for jointly maximizing speech recognition and ESR also differs between NH and HI listeners. The greater masking effectiveness of the environmental sounds relative to the speech may be related to the lower degree of fluctuation present in the environmental sounds as well as possibly task differences between speech recognition and ESR (open versus closed set). The observed differences between the NH and HI results may possibly be related to the HI listeners' smaller fluctuating masker benefit. As noise-reduction systems become increasingly effective, the current results could potentially guide the design of future systems that provide listeners with highly intelligible speech without depriving them of access to important environmental sounds.
Collapse
Affiliation(s)
- Eric M Johnson
- Department of Speech and Hearing Science and Center for Cognitive and Brian Sciences, The Ohio State University, Columbus, Ohio, USA
| | | |
Collapse
|
3
|
Ueda K, Hashimoto M, Takeichi H, Wakamiya K. Interrupted mosaic speech revisited: Gain and loss in intelligibility by stretchinga). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1767-1779. [PMID: 38441439 DOI: 10.1121/10.0025132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 02/16/2024] [Indexed: 03/07/2024]
Abstract
Our previous investigation on the effect of stretching spectrotemporally degraded and temporally interrupted speech stimuli showed remarkable intelligibility gains [Udea, Takeichi, and Wakamiya (2022). J. Acoust. Soc. Am. 152(2), 970-980]. In this previous study, however, gap durations and temporal resolution were confounded. In the current investigation, we therefore observed the intelligibility of so-called mosaic speech while dissociating the effects of interruption and temporal resolution. The intelligibility of mosaic speech (20 frequency bands and 20 ms segment duration) declined from 95% to 78% and 33% by interrupting it with 20 and 80 ms gaps. Intelligibility improved, however, to 92% and 54% (14% and 21% gains for 20 and 80 ms gaps, respectively) by stretching mosaic segments to fill silent gaps (n = 21). By contrast, the intelligibility was impoverished to a minimum of 9% (7% loss) when stretching stimuli interrupted with 160 ms gaps. Explanations based on auditory grouping, modulation unmasking, or phonemic restoration may account for the intelligibility improvement by stretching, but not for the loss. The probability summation model accounted for "U"-shaped intelligibility curves and the gain and loss of intelligibility, suggesting that perceptual unit length and speech rate may affect the intelligibility of spectrotemporally degraded speech stimuli.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Acoustic Design, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Masashi Hashimoto
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Wakamiya
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| |
Collapse
|
4
|
Ueda K, Doan LLD, Takeichi H. Checkerboard and interrupted speech: Intelligibility contrasts related to factor-analysis-based frequency bandsa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2010-2020. [PMID: 37782122 DOI: 10.1121/10.0021165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 09/08/2023] [Indexed: 10/03/2023]
Abstract
It has been shown that the intelligibility of checkerboard speech stimuli, in which speech signals were periodically interrupted in time and frequency, drastically varied according to the combination of the number of frequency bands (2-20) and segment duration (20-320 ms). However, the effects of the number of frequency bands between 4 and 20 and the frequency division parameters on intelligibility have been largely unknown. Here, we show that speech intelligibility was lowest in four-band checkerboard speech stimuli, except for the 320-ms segment duration. Then, temporally interrupted speech stimuli and eight-band checkerboard speech stimuli came in this order (N = 19 and 20). At the same time, U-shaped intelligibility curves were observed for four-band and possibly eight-band checkerboard speech stimuli. Furthermore, different parameters of frequency division resulted in small but significant intelligibility differences at the 160- and 320-ms segment duration in four-band checkerboard speech stimuli. These results suggest that factor-analysis-based four frequency bands, representing groups of critical bands correlating with each other in speech power fluctuations, work as speech cue channels essential for speech perception. Moreover, a probability summation model for perceptual units, consisting of a sub-unit process and a supra-unit process that receives outputs of the speech cue channels, may account for the U-shaped intelligibility curves.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Acoustic Design, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Linh Le Dieu Doan
- Human Science Course, Graduate School of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
5
|
Edraki A, Chan WY, Jensen J, Fogerty D. Spectro-temporal modulation glimpsing for speech intelligibility prediction. Hear Res 2022; 426:108620. [PMID: 36175300 PMCID: PMC10125146 DOI: 10.1016/j.heares.2022.108620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 09/14/2022] [Accepted: 09/20/2022] [Indexed: 11/22/2022]
Abstract
We compare two alternative speech intelligibility prediction algorithms: time-frequency glimpse proportion (GP) and spectro-temporal glimpsing index (STGI). Both algorithms hypothesize that listeners understand speech in challenging acoustic environments by "glimpsing" partially available information from degraded speech. GP defines glimpses as those time-frequency regions whose local signal-to-noise ratio is above a certain threshold and estimates intelligibility as the proportion of the time-frequency regions glimpsed. STGI, on the other hand, applies glimpsing to the spectro-temporal modulation (STM) domain and uses a similarity measure based on the normalized cross-correlation between the STM envelopes of the clean and degraded speech signals to estimate intelligibility as the proportion of the STM channels glimpsed. Our experimental results demonstrate that STGI extends the notion of glimpsing proportion to a wider range of distortions, including non-linear signal processing, and outperforms GP for the additive uncorrelated noise datasets we tested. Furthermore, the results show that spectro-temporal modulation analysis enables STGI to account for the effects of masker type on speech intelligibility, leading to superior performance over GP in modulated noise datasets.
Collapse
Affiliation(s)
- Amin Edraki
- Department of Electrical and Computer Engineering, Queen's University, Kingston, ON K7L 3N6, Canada.
| | - Wai-Yip Chan
- Department of Electrical and Computer Engineering, Queen's University, Kingston, ON K7L 3N6, Canada
| | - Jesper Jensen
- Department of Electronic Systems, Aalborg University, Aalborg 9220, Denmark; Demant A/S, Smørum 2765, Denmark
| | - Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA
| |
Collapse
|
6
|
Fogerty D, Madorskiy R, Vickery B, Shafiro V. Recognition of Interrupted Speech, Text, and Text-Supplemented Speech by Older Adults: Effect of Interruption Rate. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:4404-4416. [PMID: 36251884 PMCID: PMC9940893 DOI: 10.1044/2022_jslhr-22-00247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/18/2022] [Accepted: 07/18/2022] [Indexed: 05/03/2023]
Abstract
PURPOSE Studies of speech and text interruption indicate that the interruption rate influences the perceptual information available, from whole words at slow rates to subphonemic cues at faster interruptions rates. In young adults, the benefit obtained from text supplementation of speech may depend on the type of perceptual information available in either modality. Age commonly reduces temporal aspects of information processing, which may influence the benefit older adults obtain from text-supplemented speech across interruption rates. METHOD Older adults were tested unimodally and multimodally with spoken and printed sentences that were interrupted by silence or white space at various rates. RESULTS Results demonstrate U-shaped performance-rate functions for all modality conditions, with minimal performance around interruption rates of 2-4 Hz. Comparison to previous studies with younger adults indicates overall poorer recognition for interrupted materials by the older adults. However, as a group, older adults can integrate information between the two modalities to a similar degree as younger adults. Individual differences in multimodal integration were noted. CONCLUSION Overall, these results indicate that older adults, while demonstrating poorer overall performance in comparison to younger adults, successfully combine distributed partial information across speech and text modalities to facilitate sentence recognition.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
| | - Rachel Madorskiy
- Department of Speech, Language, Hearing, and Occupational Sciences, University of Montana, Missoula
| | - Blythe Vickery
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia
| | - Valeriy Shafiro
- Department of Communication Disorders and Sciences, Rush University Medical Center, Chicago, IL
| |
Collapse
|
7
|
Ueda K, Takeichi H, Wakamiya K. Auditory grouping is necessary to understand interrupted mosaic speech stimuli. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:970. [PMID: 36050149 PMCID: PMC9553289 DOI: 10.1121/10.0013425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/13/2022] [Accepted: 07/21/2022] [Indexed: 06/15/2023]
Abstract
The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here, we show that the intelligibility for mosaic speech in which original speech was segmented in frequency and time and noise-vocoded with the average power in each unit was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments by stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough ( ≥ 4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because the deprivation of periodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information Research and Development and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Wakamiya
- Department of Communication Design Science, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| |
Collapse
|
8
|
Har-shai Yahav P, Zion Golumbic E. Linguistic processing of task-irrelevant speech at a cocktail party. eLife 2021; 10:e65096. [PMID: 33942722 PMCID: PMC8163500 DOI: 10.7554/elife.65096] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 04/26/2021] [Indexed: 01/05/2023] Open
Abstract
Paying attention to one speaker in a noisy place can be extremely difficult, because to-be-attended and task-irrelevant speech compete for processing resources. We tested whether this competition is restricted to acoustic-phonetic interference or if it extends to competition for linguistic processing as well. Neural activity was recorded using Magnetoencephalography as human participants were instructed to attend to natural speech presented to one ear, and task-irrelevant stimuli were presented to the other. Task-irrelevant stimuli consisted either of random sequences of syllables, or syllables structured to form coherent sentences, using hierarchical frequency-tagging. We find that the phrasal structure of structured task-irrelevant stimuli was represented in the neural response in left inferior frontal and posterior parietal regions, indicating that selective attention does not fully eliminate linguistic processing of task-irrelevant speech. Additionally, neural tracking of to-be-attended speech in left inferior frontal regions was enhanced when competing with structured task-irrelevant stimuli, suggesting inherent competition between them for linguistic processing.
Collapse
Affiliation(s)
- Paz Har-shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan UniversityRamat GanIsrael
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan UniversityRamat GanIsrael
| |
Collapse
|
9
|
Ross B, Dobri S, Schumann A. Psychometric function for speech-in-noise tests accounts for word-recognition deficits in older listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:2337. [PMID: 33940923 DOI: 10.1121/10.0003956] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Accepted: 03/10/2021] [Indexed: 06/12/2023]
Abstract
Speech-in-noise (SIN) understanding in older age is affected by hearing loss, impaired central auditory processing, and cognitive deficits. SIN-tests measure these factors' compound effects by a speech reception threshold, defined as the signal-to-noise ratio required for 50% word understanding (SNR50). This study compared two standard SIN tests, QuickSIN (n = 354) in young and older adults and BKB-SIN (n = 139) in older adults (>60 years). The effects of hearing loss and age on SIN understanding were analyzed to identify auditory and nonauditory contributions to SIN loss. Word recognition in noise was modelled with individual psychometric functions using a logistic fit with three parameters: the midpoint (SNRα), slope (β), and asymptotic word-recognition deficit at high SNR (λ). The parameters SNRα and λ formally separate SIN loss into two components. SNRα characterizes the steep slope of the psychometric function at which a slight SNR increase provides a considerable improvement in SIN understanding. SNRα was discussed as being predominantly affected by audibility and low-level central auditory processing. The parameter λ describes a shallow segment of the psychometric function at which a further increase in the SNR provides modest improvement in SIN understanding. Cognitive factors in aging may contribute to the SIN loss indicated by λ.
Collapse
Affiliation(s)
- Bernhard Ross
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada
| | - Simon Dobri
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada
| | - Annette Schumann
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada
| |
Collapse
|
10
|
Fogerty D, Sevich VA, Healy EW. Spectro-temporal glimpsing of speech in noise: Regularity and coherence of masking patterns reduces uncertainty and increases intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1552. [PMID: 33003879 PMCID: PMC7500957 DOI: 10.1121/10.0001971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 08/27/2020] [Accepted: 08/28/2020] [Indexed: 06/11/2023]
Abstract
Adverse listening conditions involve glimpses of spectro-temporal speech information. This study investigated if the acoustic organization of the spectro-temporal masking pattern affects speech glimpsing in "checkerboard" noise. The regularity and coherence of the masking pattern was varied. Regularity was reduced by randomizing the spectral or temporal gating of the masking noise. Coherence involved the spectral alignment of frequency bands across time or the temporal alignment of gated onsets/offsets across frequency bands. Experiment 1 investigated the effect of spectral or temporal coherence. Experiment 2 investigated independent and combined factors of regularity and coherence. Performance was best in spectro-temporally modulated noise having larger glimpses. Generally, performance also improved as the regularity and coherence of masker fluctuations increased, with regularity having a stronger effect than coherence. An acoustic glimpsing model suggested that the effect of regularity (but not coherence) could be partially attributed to the availability of glimpses retained after energetic masking. Performance tended to be better with maskers that were spectrally coherent as compared to temporally coherent. Overall, performance was best when the spectro-temporal masking pattern imposed even spectral sampling and minimal temporal uncertainty, indicating that listeners use reliable masking patterns to aid in spectro-temporal speech glimpsing.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, 1705 College Street, Columbia, South Carolina 29208, USA
| | - Victoria A Sevich
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| |
Collapse
|
11
|
Fogerty D, Sevich VA, Healy EW. Spectro-temporal glimpsing of speech in noise: Regularity and coherence of masking patterns reduces uncertainty and increases intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1552. [PMID: 33003879 DOI: 10.5041466/10.0001971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Adverse listening conditions involve glimpses of spectro-temporal speech information. This study investigated if the acoustic organization of the spectro-temporal masking pattern affects speech glimpsing in "checkerboard" noise. The regularity and coherence of the masking pattern was varied. Regularity was reduced by randomizing the spectral or temporal gating of the masking noise. Coherence involved the spectral alignment of frequency bands across time or the temporal alignment of gated onsets/offsets across frequency bands. Experiment 1 investigated the effect of spectral or temporal coherence. Experiment 2 investigated independent and combined factors of regularity and coherence. Performance was best in spectro-temporally modulated noise having larger glimpses. Generally, performance also improved as the regularity and coherence of masker fluctuations increased, with regularity having a stronger effect than coherence. An acoustic glimpsing model suggested that the effect of regularity (but not coherence) could be partially attributed to the availability of glimpses retained after energetic masking. Performance tended to be better with maskers that were spectrally coherent as compared to temporally coherent. Overall, performance was best when the spectro-temporal masking pattern imposed even spectral sampling and minimal temporal uncertainty, indicating that listeners use reliable masking patterns to aid in spectro-temporal speech glimpsing.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, 1705 College Street, Columbia, South Carolina 29208, USA
| | - Victoria A Sevich
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| |
Collapse
|
12
|
Vicente T, Lavandier M. Further validation of a binaural model predicting speech intelligibility against envelope-modulated noises. Hear Res 2020; 390:107937. [PMID: 32192940 DOI: 10.1016/j.heares.2020.107937] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 02/24/2020] [Accepted: 02/28/2020] [Indexed: 10/24/2022]
Abstract
Collin and Lavandier [J. Acoust. Soc. Am. 134, 1146-1159 (2013)] proposed a binaural model predicting speech intelligibility against envelope-modulated noises, evaluated in 24 acoustic conditions, involving similar masker types. The aim of the present study was to test the model robustness modeling 80 additional conditions, and evaluate the influence of its parameters using an approach inspired by a variance-based sensitivity analysis. First, the data from four experiments from the literature and one specifically designed for the present study were used to evaluate the prediction performance of the model, investigate potential interactions between its parameters, and define their values leading to the best predictions. A revision of the model allowed to account for binaural sluggishness. Finally, the optimized model was tested on an additional dataset not used to define its parameters. Overall, one hundred conditions split into six experiments were modeled. Correlation between data and predictions ranged from 0.85 to 0.96 across experiments, and mean absolute prediction errors were between 0.5 and 1.4 dB.
Collapse
Affiliation(s)
- Thibault Vicente
- Univ Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, 69518, Vaulx-en-Velin Cedex, France.
| | - Mathieu Lavandier
- Univ Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, 69518, Vaulx-en-Velin Cedex, France
| |
Collapse
|