1
|
Hermon AM, Ganapathy K, Palaniswamy HP, Muthu ANP. Development and Validation of the Spatial Separation Sentence Test in Kannada. J Audiol Otol 2024; 28:228-235. [PMID: 38685832 PMCID: PMC11273185 DOI: 10.7874/jao.2023.00325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 10/01/2023] [Accepted: 11/21/2023] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND AND OBJECTIVES This study aimed to develop and validate a modified version of the Speech in Noise Sentence Test in Kannada, which would be appropriate for testing the speech comprehension ability of children aged 8-12 years. SUBJECTS AND METHODS A total of 120 sentences were chosen from 200 familiar sentences and split into four lists. Continuous discourse was used as a competition or distractor. Using MATLAB, the target stimulus was presented at 0-degree azimuth while the distractor's location varied (+90° and -90° azimuth). The test was programmed to dynamically adjust the signal-to-noise ratio (SNR) based on participants' responses. After initial validation, a pilot study was conducted with 60 typically hearing children aged 8 to 12 years. RESULTS The SNR50 scores significantly improved when the distractor and target sentences were spatially separated across all groups. Age had a significant influence on the spatial separation scores. The test-retest reliability was excellent. CONCLUSIONS The developed stimuli effectively measured spatial separation, and the normative and psychometric analyses demonstrated reliable outcomes.
Collapse
Affiliation(s)
- Asish Mervin Hermon
- Department of Speech and Hearing, Manipal College of Health Professions, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Kanaka Ganapathy
- Department of Speech and Hearing, Manipal College of Health Professions, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Hari Prakash Palaniswamy
- Department of Speech and Hearing, Manipal College of Health Professions, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Arivudai Nambi Pitchai Muthu
- Department of Audiology and Speech Language Pathology, KMC Mangalore, Manipal Academy of Higher Education, Karnataka, India
| |
Collapse
|
2
|
Eurich B, Dietz M. Fast binaural processing but sluggish masker representation reconfiguration. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1862-1870. [PMID: 37747145 DOI: 10.1121/10.0021072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 08/31/2023] [Indexed: 09/26/2023]
Abstract
Perceptual organization of complex acoustic scenes requires fast binaural processing for accurate localization or lateralization based on short single-source-dominated glimpses. This sensitivity also manifests in the ability to detect rapid oscillating interaural time and phase differences as well as interaural correlation. However, binaural processing has also been termed "sluggish" based on experiments that require binaural detection in a masker with an additional binaural cue change in temporal proximity. The present study shows that the temporal integration windows obtained from data on binaural sluggishness cannot account for the detection of rapid binaural oscillations. A model with fast IPD encoding but a slower process of updating the internal representation of the masker IPD statistics accounted for both experiments of the "fast" and the "sluggish" categories.
Collapse
Affiliation(s)
- Bernhard Eurich
- Department für Medizinische Physik und Akustik, Universität Oldenburg, 26129 Oldenburg, Germany
| | - Mathias Dietz
- Department für Medizinische Physik und Akustik, Universität Oldenburg, 26129 Oldenburg, Germany
| |
Collapse
|
3
|
Prud'homme L, Lavandier M, Best V. A dynamic binaural harmonic-cancellation model to predict speech intelligibility against a harmonic masker varying in intonation, temporal envelope, and location. Hear Res 2022; 426:108535. [PMID: 35654633 PMCID: PMC9684346 DOI: 10.1016/j.heares.2022.108535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/26/2022] [Accepted: 05/23/2022] [Indexed: 11/28/2022]
Abstract
The aim of this study was to extend the harmonic-cancellation model proposed by Prud'homme et al. [J. Acoust. Soc. Am. 148 (2020) 3246--3254] to predict speech intelligibility against a harmonic masker, so that it takes into account binaural hearing, amplitude modulations in the masker and variations in masker fundamental frequency (F0) over time. This was done by segmenting the masker signal into time frames and combining the previous long-term harmonic-cancellation model with the binaural model proposed by Vicente and Lavandier [Hear. Res. 390 (2020) 107937]. The new model was tested on the data from two experiments involving harmonic complex maskers that varied in spatial location, temporal envelope and F0 contour. The interactions between the associated effects were accounted for in the model by varying the time frame duration and excluding the binaural unmasking computation when harmonic cancellation is active. Across both experiments, the correlation between data and model predictions was over 0.96, and the mean and largest absolute prediction errors were lower than 0.6 and 1.5 dB, respectively.
Collapse
Affiliation(s)
- Luna Prud'homme
- ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, University Lyon, Vaulx-en-Velin 69518, France
| | - Mathieu Lavandier
- ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, University Lyon, Vaulx-en-Velin 69518, France.
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA 02215, USA
| |
Collapse
|
4
|
Modelling speech reception thresholds and their improvements due to spatial noise reduction algorithms in bimodal cochlear implant users. Hear Res 2022; 420:108507. [PMID: 35484022 PMCID: PMC9188268 DOI: 10.1016/j.heares.2022.108507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 04/05/2022] [Accepted: 04/07/2022] [Indexed: 11/22/2022]
Abstract
This paper compares two modelling approaches to predict the speech recognition ability of bimodal CI users and the benefit of using beamformers. The modelling approaches vary in computational complexity and fitting requirements. A complex cafeteria spatial scenario with three localized single noise source scenario and a diffuse multi-talker babble noise is used. The automatic speech recognizer is more accurate across the different spatial scenarios and noise types and requires less fitting compared to the statistical modelling approach.
Spatial noise reduction algorithms (“beamformers”) can considerably improve speech reception thresholds (SRTs) for bimodal cochlear implant (CI) users. The goal of this study was to model SRTs and SRT-benefit due to beamformers for bimodal CI users. Two existing model approaches varying in computational complexity and binaural processing assumption were compared: (i) the framework of auditory discrimination experiments (FADE) and (ii) the binaural speech intelligibility model (BSIM), both with CI and aided hearing-impaired front-ends. The exact same acoustic scenarios, and open-access beamformers as in the comparison clinical study Zedan et al. (2021) were used to quantify goodness of prediction. FADE was capable of modeling SRTs ab-initio, i.e., no calibration of the model was necessary to achieve high correlations and low root-mean square errors (RMSE) to both, measured SRTs (r = 0.85, RMSE = 2.8 dB) and to measured SRT-benefits (r = 0.96). BSIM achieved somewhat poorer predictions to both, measured SRTs (r = 0.78, RMSE = 6.7 dB) and to measured SRT-benefits (r = 0.91) and needs to be calibrated for matching average SRTs in one condition. Greatest deviations in predictions of BSIM were observed in diffuse multi-talker babble noise, which were not found with FADE. SRT-benefit predictions of both models were similar to instrumental signal-to-noise ratio (iSNR) improvements due to the beamformers. This indicates that FADE is preferrable for modeling absolute SRTs. However, for prediction of SRT-benefit due to spatial noise reduction algorithms in bimodal CI users, the average iSNR is a much simpler approach with similar performance.
Collapse
|
5
|
Vicente T, Buchholz JM, Lavandier M. Modelling binaural unmasking and the intelligibility of speech in noise and reverberation for normal-hearing and hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3275. [PMID: 34852607 DOI: 10.1121/10.0006736] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 09/26/2021] [Indexed: 05/25/2023]
Abstract
This study investigated the effect of hearing loss on binaural unmasking (BU) for the intelligibility of speech in noise. Speech reception thresholds (SRTs) were measured with normal-hearing (NH) listeners and older mildly hearing-impaired (HI) listeners while varying the presentation level of the stimuli, reverberation, modulation of the noise masker, and spatial separation of the speech and noise sources. On average across conditions, the NH listeners benefited more (by 0.6 dB) from BU than HI listeners. The binaural intelligibility model developed by Vicente, Lavandier, and Buchholz [J. Acoust. Soc. Am. 148, 3305-3317 (2020)] was used to describe the data, accurate predictions were obtained for the conditions considering moderate noise levels [50 and 60 dB sound pressure level (SPL)]. The interaural jitters that were involved in the prediction of BU had to be revised to describe the data measured at a lower level (40 dB SPL). Across all tested conditions, the correlation between the measured and predicted SRTs was 0.92, whereas the mean prediction error was 0.9 dB.
Collapse
Affiliation(s)
- Thibault Vicente
- Department of Linguistics-Audiology, Australian Hearing Hub, Macquarie University, New South Wales, 2109, Australia
| | - Jörg M Buchholz
- Department of Linguistics-Audiology, Australian Hearing Hub, Macquarie University, New South Wales, 2109, Australia
| | - Mathieu Lavandier
- Univ. Lyon, ENTPE, Laboratoire de Tribologie et Dynamique des Systèmes UMR 5513, Rue M. Audin, 69518 Vaulx-en-Velin Cedex, France
| |
Collapse
|
6
|
Baltzell LS, Best V. High-resolution temporal weighting of interaural time differences in speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:1311. [PMID: 34470281 PMCID: PMC8561715 DOI: 10.1121/10.0005934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 07/29/2021] [Accepted: 07/30/2021] [Indexed: 06/13/2023]
Abstract
Previous studies have shown that for high-rate click trains and low-frequency pure tones, interaural time differences (ITDs) at the onset of stimulus contribute most strongly to the overall lateralization percept (receive the largest perceptual weight). Previous studies have also shown that when these stimuli are modulated, ITDs during the rising portion of the modulation cycle receive increased perceptual weight. Baltzell, Cho, Swaminathan, and Best [(2020). J. Acoust. Soc. Am. 147, 3883-3894] measured perceptual weights for a pair of spoken words ("two" and "eight"), and found that word-initial phonemes receive larger weight than word-final phonemes, suggesting a "word-onset dominance" for speech. Generalizability of this conclusion was limited by a coarse temporal resolution and limited stimulus set. In the present study, temporal weighting functions (TWFs) were measured for four spoken words ("two," "eight," "six," and "nine"). Stimuli were partitioned into 30-ms bins, ITDs were applied independently to each bin, and lateralization judgements were obtained. TWFs were derived using a hierarchical regression model. Results suggest that "word-initial" onset dominance does not generalize across words and that TWFs depend in part on acoustic changes throughout the stimulus. Two model-based predictions were generated to account for observed TWFs, but neither could fully account for the perceptual data.
Collapse
Affiliation(s)
- Lucas S Baltzell
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
7
|
Goverts ST, Colburn HS. Binaural Recordings in Natural Acoustic Environments: Estimates of Speech-Likeness and Interaural Parameters. Trends Hear 2021; 24:2331216520972858. [PMID: 33331242 PMCID: PMC7750905 DOI: 10.1177/2331216520972858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Binaural acoustic recordings were made in multiple natural environments, which were chosen to be similar to those reported to be difficult for listeners with impaired hearing. These environments include natural conversations that take place in the presence of other sound sources as found in restaurants, walking or biking in the city, and so on. Sounds from these environments were recorded binaurally with in-the-ear microphones and were analyzed with respect to speech-likeness measures and interaural difference measures. The speech-likeness measures were based on amplitude–modulation patterns within frequency bands and were estimated for 1-s time-slices. The interaural difference measures included interaural coherence, interaural time difference, and interaural level difference, which were estimated for time-slices of 20-ms duration. These binaural measures were documented for one-fourth-octave frequency bands centered at 500 Hz and for the envelopes of one-fourth-octave bands centered at 2000 Hz. For comparison purposes, the same speech-likeness and interaural difference measures were computed for a set of virtual recordings that mimic typical clinical test configurations. These virtual recordings were created by filtering anechoic waveforms with available head-related transfer functions and combining them to create multiple source combinations. Overall, the speech-likeness results show large variability within and between environments, and they demonstrate the importance of having information from both ears available. Furthermore, the interaural parameter results show that the natural recordings contain a relatively small proportion of time-slices with high coherence compared with the virtual recordings; however, when present, binaural cues might be used for selecting intervals with good speech intelligibility for individual sources.
Collapse
Affiliation(s)
- S Theo Goverts
- Otolaryngology-Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - H Steven Colburn
- Biomedical Engineering Department, Boston University, Boston, Massachusetts, United States
| |
Collapse
|
8
|
de Cheveigné A. Harmonic Cancellation-A Fundamental of Auditory Scene Analysis. Trends Hear 2021; 25:23312165211041422. [PMID: 34698574 PMCID: PMC8552394 DOI: 10.1177/23312165211041422] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/23/2021] [Accepted: 07/09/2021] [Indexed: 11/16/2022] Open
Abstract
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des systèmes perceptifs, CNRS, Paris, France
- Département d’études cognitives, École normale supérieure, PSL
University, Paris, France
- UCL Ear Institute, London, UK
| |
Collapse
|
9
|
Vicente T, Lavandier M, Buchholz JM. A binaural model implementing an internal noise to predict the effect of hearing impairment on speech intelligibility in non-stationary noises. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:3305. [PMID: 33261412 DOI: 10.1121/10.0002660] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 10/22/2020] [Indexed: 05/20/2023]
Abstract
A binaural model predicting speech intelligibility in envelope-modulated noise for normal-hearing (NH) and hearing-impaired listeners is proposed. The study shows the importance of considering an internal noise with two components relying on the individual audiogram and the level of the external stimuli. The model was optimized and verified using speech reception thresholds previously measured in three experiments involving NH and hearing-impaired listeners and sharing common methods. The anechoic target, in front of the listener, was presented simultaneously through headphones with two anechoic noise-vocoded speech maskers (VSs) either co-located with the target or spatially separated using an infinite broadband interaural level difference without crosstalk between ears. In experiment 1, two stationary noise maskers were also tested. In experiment 2, the VSs were presented at different sensation levels to vary audibility. In experiment 3, the effects of realistic interaural time and level differences were also tested. The model was applied to two datasets involving NH listeners to verify its backward compatibility. It was optimized to predict the data, leading to a correlation and mean absolute error between data and predictions above 0.93 and below 1.1 dB, respectively. The different internal noise approaches proposed in the literature to describe hearing impairment are discussed.
Collapse
Affiliation(s)
- Thibault Vicente
- Université de Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, 69518 Vaulx-en-Velin Cedex, France
| | - Mathieu Lavandier
- Université de Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, 69518 Vaulx-en-Velin Cedex, France
| | - Jörg M Buchholz
- Department of Linguistics-Audiology, Australian Hearing Hub, Macquarie University, 2109 New South Wales, Australia
| |
Collapse
|
10
|
Wagner L, Geiling L, Hauth C, Hocke T, Plontke S, Rahne T. Improved binaural speech reception thresholds through small symmetrical separation of speech and noise. PLoS One 2020; 15:e0236469. [PMID: 32756594 PMCID: PMC7406049 DOI: 10.1371/journal.pone.0236469] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 07/07/2020] [Indexed: 12/02/2022] Open
Abstract
Speech perception in noise is challenging and is improved by binaural hearing. Since signal processing of assistive hearing devices often modifies or masks the peripheral binaural head-shadow or better-ear effects, central binaural processing should be measured separately. In a prospective study, 10 listeners with normal hearing were tested with the German matrix sentence test in a set-up with two loudspeakers located at opposite angles in the horizontal plane with respect to S0N0. The speech reception threshold (SRT) was investigated depending on the separation angle between speech and noise. The lowest (best) SRT was obtained for a separation of target and interfering source from S0N0 at an angle of about S±60°N∓60°. The derived normative curve was comparable to SRTs predicted by the binaural-speech-intelligibility-model. The systematic separation of signal and noise showed a significant improvement in speech intelligibility for normal-hearing people even for small separation angles. This experimental setting was verified. This study aimed to assess the effect of small sound source separation on binaural hearing and speech perception.
Collapse
Affiliation(s)
- Luise Wagner
- Department of Otorhinolaryngology, Head and Neck Surgery, Martin Luther University Halle-Wittenberg, University Medicine Halle, Halle, Germany
- * E-mail:
| | - Lukas Geiling
- Department of Otorhinolaryngology, Head and Neck Surgery, Martin Luther University Halle-Wittenberg, University Medicine Halle, Halle, Germany
| | - Christopher Hauth
- Department of Medical Physics and Cluster of Excellence Hearing4All, Carl von Ossietzky University, Oldenburg, Germany
| | - Thomas Hocke
- Cochlear Deutschland GmbH & Co. KG, Hannover, Germany
| | - Stefan Plontke
- Department of Otorhinolaryngology, Head and Neck Surgery, Martin Luther University Halle-Wittenberg, University Medicine Halle, Halle, Germany
| | - Torsten Rahne
- Department of Otorhinolaryngology, Head and Neck Surgery, Martin Luther University Halle-Wittenberg, University Medicine Halle, Halle, Germany
| |
Collapse
|
11
|
Baltzell LS, Swaminathan J, Cho AY, Lavandier M, Best V. Binaural sensitivity and release from speech-on-speech masking in listeners with and without hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1546. [PMID: 32237845 PMCID: PMC7060089 DOI: 10.1121/10.0000812] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 02/07/2020] [Accepted: 02/11/2020] [Indexed: 05/29/2023]
Abstract
Listeners with sensorineural hearing loss routinely experience less spatial release from masking (SRM) in speech mixtures than listeners with normal hearing. Hearing-impaired listeners have also been shown to have degraded temporal fine structure (TFS) sensitivity, a consequence of which is degraded access to interaural time differences (ITDs) contained in the TFS. Since these "binaural TFS" cues are critical for spatial hearing, it has been hypothesized that degraded binaural TFS sensitivity accounts for the limited SRM experienced by hearing-impaired listeners. In this study, speech stimuli were noise-vocoded using carriers that were systematically decorrelated across the left and right ears, thus simulating degraded binaural TFS sensitivity. Both (1) ITD sensitivity in quiet and (2) SRM in speech mixtures spatialized using ITDs (or binaural release from masking; BRM) were measured as a function of TFS interaural decorrelation in young normal-hearing and hearing-impaired listeners. This allowed for the examination of the relationship between ITD sensitivity and BRM over a wide range of ITD thresholds. This paper found that, for a given ITD sensitivity, hearing-impaired listeners experienced less BRM than normal-hearing listeners, suggesting that binaural TFS sensitivity can account for only a modest portion of the BRM deficit in hearing-impaired listeners. However, substantial individual variability was observed.
Collapse
Affiliation(s)
- Lucas S Baltzell
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jayaganesh Swaminathan
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Adrian Y Cho
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Mathieu Lavandier
- University of Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue Maurice Audin, F-69518 Vaulx-en-Velin Cedex, France
| | - Virginia Best
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
12
|
Rennies J, Warzybok A, Brand T, Kollmeier B. Measurement and Prediction of Binaural-Temporal Integration of Speech Reflections. Trends Hear 2019; 23:2331216519854267. [PMID: 31234732 PMCID: PMC6593929 DOI: 10.1177/2331216519854267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
For speech intelligibility in rooms, the temporal integration of speech reflections is typically modeled by separating the room impulse response (RIR) into an early (assumed beneficial for speech intelligibility) and a late part (assumed detrimental). This concept was challenged in this study by employing binaural RIRs with systematically varied interaural phase differences (IPDs) and amplitude of the direct sound and a variable number of reflections delayed by up to 200 ms. Speech recognition thresholds in stationary noise were measured in normal-hearing listeners for 86 conditions. The data showed that direct sound and one or several early speech reflections could be perfectly integrated when they had the same IPD. Early reflections with the same IPD as the noise (but not as the direct sound) could not be perfectly integrated with the direct sound. All conditions in which the dominant speech information was within the early RIR components could be well predicted by a binaural speech intelligibility model using classic early/late separation. In contrast, when amplitude or IPD favored late RIR components, listeners appeared to be capable of focusing on these components rather than on the precedent direct sound. This could not be modeled by an early/late separation window but required a temporal integration window that can be flexibly shifted along the RIR.
Collapse
Affiliation(s)
- Jan Rennies
- 1 Department of Speech, Language and Hearing Sciences, Boston University, Boston, MA, USA.,2 Project Group Hearing, Speech and Audio Technology, Fraunhofer Institute for Digital Media Technology IDMT, Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Anna Warzybok
- 3 Medical Physics Group, Department of Medical Physics and Acoustics, Cluster of Excellence Hearing4all, University of Oldenburg, Germany
| | - Thomas Brand
- 3 Medical Physics Group, Department of Medical Physics and Acoustics, Cluster of Excellence Hearing4all, University of Oldenburg, Germany
| | - Birger Kollmeier
- 2 Project Group Hearing, Speech and Audio Technology, Fraunhofer Institute for Digital Media Technology IDMT, Cluster of Excellence Hearing4all, Oldenburg, Germany.,3 Medical Physics Group, Department of Medical Physics and Acoustics, Cluster of Excellence Hearing4all, University of Oldenburg, Germany
| |
Collapse
|
13
|
Spectrotemporal window of binaural integration in auditory object formation. Hear Res 2018; 370:155-167. [PMID: 30388573 DOI: 10.1016/j.heares.2018.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Revised: 10/12/2018] [Accepted: 10/17/2018] [Indexed: 11/21/2022]
Abstract
Binaural integration of interaural temporal information is essential for sound source localization and segregation. Current models of binaural interaction have shown that accurate sound localization in the horizontal plane depends on the resolution of phase ambiguous information by across-frequency integration. However, as such models are mostly static, it is not clear how proximate in time binaural events in different frequency channels should occur to form an auditory object with a unique lateral position. The present study examined the spectrotemporal window required for effective integration of binaural cues across frequency to form the perception of a stationary position. In Experiment 1, listeners judged whether dichotic frequency-modulated (FM) sweeps with a constant large nominal interaural delay (1500 μs), whose perceived laterality was ambiguous depending on the sweep rate (1500, 3000, 6000, and 12,000 Hz/s), produced a percept of continuous motion or a stationary image. Motion detection performance, indexed by d-prime (d') values, showed a clear effect of sweep rate, with auditory motion effects most pronounced for low sweep rates, and a punctate stationary image at high rates. Experiment 2 examined the effect of modulation rate (0.5, 3, 20, and 50 Hz) on lateralizing sinusoidally frequency-modulated (SFM) tones to confirm the effect of sweep rate on motion detection, independent of signal duration. Lateralization accuracy increased with increasing modulation rate up to 20 Hz and saturated at 50 Hz, with poorest performance occurring below 3 Hz depending on modulator phase. Using the transition point where percepts changed from motion to stationary images, we estimated a spectrotemporal integration window of approximately 150 ms per octave required for effective integration of interaural temporal cues across frequency channels. A Monte Carlo simulation based on a cross-correlation model of binaural interaction predicted 90% of the variance on perceptual motion detection performance as a function of FM sweep rate. Findings suggest that the rate of frequency channel convergence of binaural cues is essential to binaural lateralization.
Collapse
|
14
|
Rennies J, Kidd G. Benefit of binaural listening as revealed by speech intelligibility and listening effort. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2147. [PMID: 30404476 PMCID: PMC6185866 DOI: 10.1121/1.5057114] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 09/13/2018] [Accepted: 09/13/2018] [Indexed: 05/22/2023]
Abstract
In contrast to the well-known benefits for speech intelligibility, the advantage afforded by binaural stimulus presentation for reducing listening effort has not been thoroughly examined. This study investigated spatial release of listening effort and its relation to binaural speech intelligibility in listeners with normal hearing. Psychometric functions for speech intelligibility of a frontal target talker masked by a stationary speech-shaped noise were estimated for several different noise azimuths, different degrees of reverberation, and by maintaining only interaural level or time differences. For each of these conditions, listening effort was measured using a categorical scaling procedure. The results revealed that listening effort was significantly reduced when target and masker were spatially separated in anechoic conditions. This effect extended well into the range of signal-to-noise ratios (SNRs) in which speech intelligibility was at ceiling, and disappeared only at the highest SNRs. In reverberant conditions, spatial release from listening effort was observed for high, but not low, direct-to-reverberant ratios. The findings suggest that listening effort assessment can be a useful method for revealing the benefits of spatial separation of sources under realistic listening conditions comprising favorable SNRs and low reverberation, which typically are not apparent by other means.
Collapse
Affiliation(s)
- Jan Rennies
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Gerald Kidd
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|