1
|
Mesiano PA, Zaar J, Bramslw L, Relaño-Iborra H, Dau T. The Role of Average Fundamental Frequency Difference on the Intelligibility of Real-Life Competing Sentences. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023:1-14. [PMID: 37390502 DOI: 10.1044/2023_jslhr-22-00219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2023]
Abstract
PURPOSE The average fundamental frequency separation (∆fo) between two competing voices has been shown to provide an important cue for target-speech intelligibility. However, some of the previous investigations used speech materials with linguistic properties and fo characteristics that may not be typical of realistic acoustic scenarios. This study investigated to what extent the effect of ∆fo generalizes to more real-life speech. METHODS Real-life sentences and a well-controlled method for manipulating the acoustic stimuli were employed. Fifteen young normal-hearing native Danish listeners were tested in a two-competing-voices sentence recognition task at several target-to-masker ratios (TMRs) and ∆fos. RESULTS Compared to previous studies that addressed the same experimental scenario with less realistic speech materials, the present results showed only a moderate effect of ∆fo at negative TMRs and a negligible effect at positive TMRs. An analysis of the employed stimuli showed that a large ∆fo effect on the target speech intelligibility is only observed when the competing sentences have highly synchronous fo trajectories, which is typical of the artificial speech materials employed in previous studies. CONCLUSION Overall, the present results suggest a relatively small effect of ∆fo on the intelligibility of real-life speech, as compared to previously employed artificial speech, in two-competing-sentences conditions.
Collapse
Affiliation(s)
- Paolo A Mesiano
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby
| | - Johannes Zaar
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby
- Eriksholm Research Centre, Helsingør, Denmark
| | | | - Helia Relaño-Iborra
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby
| | - Torsten Dau
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby
| |
Collapse
|
2
|
Steinmetzger K, Rosen S. No evidence for a benefit from masker harmonicity in the perception of speech in noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1064. [PMID: 36859153 DOI: 10.1121/10.0017065] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 01/10/2023] [Indexed: 06/18/2023]
Abstract
When assessing the intelligibility of speech embedded in background noise, maskers with a harmonic spectral structure have been found to be much less detrimental to performance than noise-based interferers. While spectral "glimpsing" in between the resolved masker harmonics and reduced envelope modulations of harmonic maskers have been shown to contribute, this effect has primarily been attributed to the proposed ability of the auditory system to cancel harmonic maskers from the signal mixture. Here, speech intelligibility in the presence of harmonic and inharmonic maskers with similar spectral glimpsing opportunities and envelope modulation spectra was assessed to test the theory of harmonic cancellation. Speech reception thresholds obtained from normal-hearing listeners revealed no effect of masker harmonicity, neither for maskers with static nor dynamic pitch contours. The results show that harmonicity, or time-domain periodicity, as such, does not aid the segregation of speech and masker. Contrary to what might be assumed, this also implies that the saliency of the masker pitch did not affect auditory grouping. Instead, the current data suggest that the reduced masking effectiveness of harmonic sounds is due to the regular spacing of their spectral components.
Collapse
Affiliation(s)
- Kurt Steinmetzger
- Section of Biomagnetism, Department of Neurology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120 Heidelberg, Germany
| | - Stuart Rosen
- Speech, Hearing and Phonetic Sciences, University College London (UCL), Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| |
Collapse
|
3
|
Prud'homme L, Lavandier M, Best V. Investigating the role of harmonic cancellation in speech-on-speech masking. Hear Res 2022; 426:108562. [PMID: 35768309 PMCID: PMC9722527 DOI: 10.1016/j.heares.2022.108562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/26/2022] [Accepted: 06/15/2022] [Indexed: 11/30/2022]
Abstract
This study investigated the role of harmonic cancellation in the intelligibility of speech in "cocktail party" situations. While there is evidence that harmonic cancellation plays a role in the segregation of simple harmonic sounds based on fundamental frequency (F0), its utility for mixtures of speech containing non-stationary F0s and unvoiced segments is unclear. Here we focused on the energetic masking of speech targets caused by competing speech maskers. Speech reception thresholds were measured using seven maskers: speech-shaped noise, monotonized and intonated harmonic complexes, monotonized speech, noise-vocoded speech, reversed speech and natural speech. These maskers enabled an estimate of how the masking potential of speech is influenced by harmonic structure, amplitude modulation and variations in F0 over time. Measured speech reception thresholds were compared to the predictions of two computational models, with and without a harmonic cancellation component. Overall, the results suggest a minor role of harmonic cancellation in reducing energetic masking in speech mixtures.
Collapse
Affiliation(s)
- Luna Prud'homme
- Univ Lyon, ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France
| | - Mathieu Lavandier
- Univ Lyon, ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, 69518 Vaulx-en-Velin, France.
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| |
Collapse
|
4
|
Prud'homme L, Lavandier M, Best V. A dynamic binaural harmonic-cancellation model to predict speech intelligibility against a harmonic masker varying in intonation, temporal envelope, and location. Hear Res 2022; 426:108535. [PMID: 35654633 PMCID: PMC9684346 DOI: 10.1016/j.heares.2022.108535] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 04/26/2022] [Accepted: 05/23/2022] [Indexed: 11/28/2022]
Abstract
The aim of this study was to extend the harmonic-cancellation model proposed by Prud'homme et al. [J. Acoust. Soc. Am. 148 (2020) 3246--3254] to predict speech intelligibility against a harmonic masker, so that it takes into account binaural hearing, amplitude modulations in the masker and variations in masker fundamental frequency (F0) over time. This was done by segmenting the masker signal into time frames and combining the previous long-term harmonic-cancellation model with the binaural model proposed by Vicente and Lavandier [Hear. Res. 390 (2020) 107937]. The new model was tested on the data from two experiments involving harmonic complex maskers that varied in spatial location, temporal envelope and F0 contour. The interactions between the associated effects were accounted for in the model by varying the time frame duration and excluding the binaural unmasking computation when harmonic cancellation is active. Across both experiments, the correlation between data and model predictions was over 0.96, and the mean and largest absolute prediction errors were lower than 0.6 and 1.5 dB, respectively.
Collapse
Affiliation(s)
- Luna Prud'homme
- ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, University Lyon, Vaulx-en-Velin 69518, France
| | - Mathieu Lavandier
- ENTPE, Ecole Centrale de Lyon, CNRS, LTDS, UMR5513, University Lyon, Vaulx-en-Velin 69518, France.
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA 02215, USA
| |
Collapse
|
5
|
Lavandier M, Mason CR, Baltzell LS, Best V. Individual differences in speech intelligibility at a cocktail party: A modeling perspective. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:1076. [PMID: 34470293 PMCID: PMC8561716 DOI: 10.1121/10.0005851] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/07/2021] [Accepted: 07/21/2021] [Indexed: 06/13/2023]
Abstract
This study aimed at predicting individual differences in speech reception thresholds (SRTs) in the presence of symmetrically placed competing talkers for young listeners with sensorineural hearing loss. An existing binaural model incorporating the individual audiogram was revised to handle severe hearing losses by (a) taking as input the target speech level at SRT in a given condition and (b) introducing a floor in the model to limit extreme negative better-ear signal-to-noise ratios. The floor value was first set using SRTs measured with stationary and modulated noises. The model was then used to account for individual variations in SRTs found in two previously published data sets that used speech maskers. The model accounted well for the variation in SRTs across listeners with hearing loss, based solely on differences in audibility. When considering listeners with normal hearing, the model could predict the best SRTs, but not the poorer SRTs, suggesting that other factors limit performance when audibility (as measured with the audiogram) is not compromised.
Collapse
Affiliation(s)
- Mathieu Lavandier
- Univ. Lyon, ENTPE, Laboratoire de Tribologie et Dynamique des Systèmes UMR 5513, Rue Maurice Audin, F-69518 Vaulx-en-Velin Cedex, France
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Lucas S Baltzell
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
6
|
de Cheveigné A. Harmonic Cancellation-A Fundamental of Auditory Scene Analysis. Trends Hear 2021; 25:23312165211041422. [PMID: 34698574 PMCID: PMC8552394 DOI: 10.1177/23312165211041422] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/23/2021] [Accepted: 07/09/2021] [Indexed: 11/16/2022] Open
Abstract
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des systèmes perceptifs, CNRS, Paris, France
- Département d’études cognitives, École normale supérieure, PSL
University, Paris, France
- UCL Ear Institute, London, UK
| |
Collapse
|
7
|
Wasiuk PA, Lavandier M, Buss E, Oleson J, Calandruccio L. The effect of fundamental frequency contour similarity on multi-talker listening in older and younger adults. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:3527. [PMID: 33379934 PMCID: PMC7863686 DOI: 10.1121/10.0002661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Older adults with hearing loss have greater difficulty recognizing target speech in multi-talker environments than young adults with normal hearing, especially when target and masker speech streams are perceptually similar. A difference in fundamental frequency (f0) contour depth is an effective stream segregation cue for young adults with normal hearing. This study examined whether older adults with varying degrees of sensorineural hearing loss are able to utilize differences in target/masker f0 contour depth to improve speech recognition in multi-talker listening. Speech recognition thresholds (SRTs) were measured for speech mixtures composed of target/masker streams with flat, normal, and exaggerated speaking styles, in which f0 contour depth systematically varied. Computational modeling estimated differences in energetic masking across listening conditions. Young adults had lower SRTs than older adults; a result that was partially explained by differences in audibility predicted by the model. However, audibility differences did not explain why young adults experienced a benefit from mismatched target/masker f0 contour depth, while in most conditions, older adults did not. Reduced ability to use segregation cues (differences in target/masker f0 contour depth), and deficits grouping speech with variable f0 contours likely contribute to difficulties experienced by older adults in challenging acoustic environments.
Collapse
Affiliation(s)
- Peter A Wasiuk
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Mathieu Lavandier
- Univ. Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, Vaulx-en-Velin Cedex, 69518, France
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina, CB#7070, Chapel Hill, North Carolina 27599, USA
| | - Jacob Oleson
- Department of Biostatistics, N300 CPHB, University of Iowa, 145 North Riverside Drive, Iowa City, Iowa 52242-2007, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|