1
|
Monson BB, Ananthanarayana RM, Trine A, Delaram V, Christopher Stecker G, Buss E. Differential benefits of unmasking extended high-frequency content of target or background speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:454-462. [PMID: 37489913 PMCID: PMC10371353 DOI: 10.1121/10.0020175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/14/2023] [Accepted: 06/29/2023] [Indexed: 07/26/2023]
Abstract
Current evidence supports the contribution of extended high frequencies (EHFs; >8 kHz) to speech recognition, especially for speech-in-speech scenarios. However, it is unclear whether the benefit of EHFs is due to phonetic information in the EHF band, EHF cues to access phonetic information at lower frequencies, talker segregation cues, or some other mechanism. This study investigated the mechanisms of benefit derived from a mismatch in EHF content between target and masker talkers for speech-in-speech recognition. EHF mismatches were generated using full band (FB) speech and speech low-pass filtered at 8 kHz. Four filtering combinations with independently filtered target and masker speech were used to create two EHF-matched and two EHF-mismatched conditions for one- and two-talker maskers. Performance was best with the FB target and the low-pass masker in both one- and two-talker masker conditions, but the effect was larger for the two-talker masker. No benefit of an EHF mismatch was observed for the low-pass filtered target. A word-by-word analysis indicated higher recognition odds with increasing EHF energy level in the target word. These findings suggest that the audibility of target EHFs provides target phonetic information or target segregation and selective attention cues, but that the audibility of masker EHFs does not confer any segregation benefit.
Collapse
Affiliation(s)
- Brian B Monson
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Rohit M Ananthanarayana
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Allison Trine
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Vahid Delaram
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| | - G Christopher Stecker
- Spatial Hearing Laboratory, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Emily Buss
- Department of Otolaryngology/HNS, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
2
|
Wasiuk PA, Buss E, Oleson JJ, Calandruccio L. Predicting speech-in-speech recognition: Short-term audibility, talker sex, and listener factors. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3010. [PMID: 36456289 DOI: 10.1121/10.0015228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 11/01/2022] [Indexed: 06/17/2023]
Abstract
Speech-in-speech recognition can be challenging, and listeners vary considerably in their ability to accomplish this complex auditory-cognitive task. Variability in performance can be related to intrinsic listener factors as well as stimulus factors associated with energetic and informational masking. The current experiments characterized the effects of short-term audibility of the target, differences in target and masker talker sex, and intrinsic listener variables on sentence recognition in two-talker speech and speech-shaped noise. Participants were young adults with normal hearing. Each condition included the adaptive measurement of speech reception thresholds, followed by testing at a fixed signal-to-noise ratio (SNR). Short-term audibility for each keyword was quantified using a computational glimpsing model for target+masker mixtures. Scores on a psychophysical task of auditory stream segregation predicted speech recognition, with stronger effects for speech-in-speech than speech-in-noise. Both speech-in-speech and speech-in-noise recognition depended on the proportion of audible glimpses available in the target+masker mixture, even across stimuli presented at the same global SNR. Short-term audibility requirements varied systematically across stimuli, providing an estimate of the greater informational masking for speech-in-speech than speech-in-noise recognition and quantifying informational masking for matched and mismatched talker sex.
Collapse
Affiliation(s)
- Peter A Wasiuk
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, 170 Manning Drive, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Jacob J Oleson
- Department of Biostatistics, 145 North Riverside Drive, University of Iowa, Iowa City, Iowa 52242, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|
3
|
Lutfi RA, Pastore T, Rodriguez B, Yost WA, Lee J. Molecular analysis of individual differences in talker search at the cocktail-party. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1804. [PMID: 36182280 PMCID: PMC9507302 DOI: 10.1121/10.0014116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/22/2022] [Accepted: 08/29/2022] [Indexed: 06/16/2023]
Abstract
A molecular (trial-by-trial) analysis of data from a cocktail-party, target-talker search task was used to test two general classes of explanations accounting for individual differences in listener performance: cue weighting models for which errors are tied to the speech features talkers have in common with the target and internal noise models for which errors are largely independent of these features. The speech of eight different talkers was played simultaneously over eight different loudspeakers surrounding the listener. The locations of the eight talkers varied at random from trial to trial. The listener's task was to identify the location of a target talker with which they had previously been familiarized. An analysis of the response counts to individual talkers showed predominant confusion with one talker sharing the same fundamental frequency and timbre as the target and, secondarily, other talkers sharing the same timbre. The confusions occurred for a roughly constant 31% of all of the trials for all of the listeners. The remaining errors were uniformly distributed across the remaining talkers and responsible for the large individual differences in performances observed. The results are consistent with a model in which largely stimulus-independent factors (internal noise) are responsible for the wide variation in performance across listeners.
Collapse
Affiliation(s)
- Robert A Lutfi
- Auditory Behavioral Research Laboratory, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - Torben Pastore
- Spatial Hearing Laboratory, Department of Speech and Hearing, Arizona State University, Tempe, Arizona 85281, USA
| | - Briana Rodriguez
- Auditory Behavioral Research Laboratory, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - William A Yost
- Spatial Hearing Laboratory, Department of Speech and Hearing, Arizona State University, Tempe, Arizona 85281, USA
| | - Jungmee Lee
- Auditory Behavioral Research Laboratory, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| |
Collapse
|
4
|
Mepham A, Bi Y, Mattys SL. The time-course of linguistic interference during native and non-native speech-in-speech listening. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:954. [PMID: 36050191 DOI: 10.1121/10.0013417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 07/20/2022] [Indexed: 06/15/2023]
Abstract
Recognizing speech in a noisy background is harder when the background is time-forward than for time-reversed speech, a masker direction effect, and harder when the masker is in a known rather than an unknown language, indicating linguistic interference. We examined the masker direction effect when the masker was a known vs unknown language and calculated performance over 50 trials to assess differential masker adaptation. In experiment 1, native English listeners transcribing English sentences showed a larger masker direction effect with English than Mandarin maskers. In experiment 2, Mandarin non-native speakers of English transcribing Mandarin sentences showed a mirror pattern. Both experiments thus support the target-masker linguistic similarity hypothesis, where interference is maximal when target and masker languages are the same. In experiment 3, Mandarin non-native speakers of English transcribing English sentences showed comparable results for English and Mandarin maskers. Non-native listening is therefore consistent with the known-language interference hypothesis, where interference is maximal when the masker language is known to the listener, whether or not it matches the target language. A trial-by-trial analysis showed that the masker direction effect increased over time during native listening but not during non-native listening. The results indicate different target-to-masker streaming strategies during native and non-native speech-in-speech listening.
Collapse
Affiliation(s)
- Alex Mepham
- Department of Psychology, University of York, Heslington, United Kingdom
| | - Yifei Bi
- College of Foreign Languages, University of Shanghai for Science and Technology, Shanghai, China
| | - Sven L Mattys
- Department of Psychology, University of York, Heslington, United Kingdom
| |
Collapse
|
5
|
Lutfi RA, Rodriguez B, Lee J. The Listener Effect in Multitalker Speech Segregation and Talker Identification. Trends Hear 2021; 25:23312165211051886. [PMID: 34693853 PMCID: PMC8544763 DOI: 10.1177/23312165211051886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Over six decades ago, Cherry (1953) drew attention to what he called the “cocktail-party problem”; the challenge of segregating the speech of one talker from others speaking at the same time. The problem has been actively researched ever since but for all this time one observation has eluded explanation. It is the wide variation in performance of individual listeners. That variation was replicated here for four major experimental factors known to impact performance: differences in task (talker segregation vs. identification), differences in the voice features of talkers (pitch vs. location), differences in the voice similarity and uncertainty of talkers (informational masking), and the presence or absence of linguistic cues. The effect of these factors on the segregation of naturally spoken sentences and synthesized vowels was largely eliminated in psychometric functions relating the performance of individual listeners to that of an ideal observer, d′ideal. The effect of listeners remained as differences in the slopes of the functions (fixed effect) with little within-listener variability in the estimates of slope (random effect). The results make a case for considering the listener a factor in multitalker segregation and identification equal in status to any major experimental variable.
Collapse
Affiliation(s)
- Robert A. Lutfi
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
- Robert A. Lutfi, Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida, 33620.
| | - Briana Rodriguez
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - Jungmee Lee
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| |
Collapse
|
6
|
Buss E, Bosen A. Band importance for speech-in-speech recognition. JASA EXPRESS LETTERS 2021; 1:084402. [PMID: 34661194 PMCID: PMC8499852 DOI: 10.1121/10.0005762] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 07/13/2021] [Indexed: 05/04/2023]
Abstract
Predicting masked speech perception typically relies on estimates of the spectral distribution of cues supporting recognition. Current methods for estimating band importance for speech-in-noise use filtered stimuli. These methods are not appropriate for speech-in-speech because filtering can modify stimulus features affecting auditory stream segregation. Here, band importance is estimated by quantifying the relationship between speech recognition accuracy for full-spectrum speech and the target-to-masker ratio by channel at the output of an auditory filterbank. Preliminary results provide support for this approach and indicate that frequencies below 2 kHz may contribute more to speech recognition in two-talker speech than in speech-shaped noise.
Collapse
Affiliation(s)
- Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Adam Bosen
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA ,
| |
Collapse
|
7
|
Liu JS, Liu YW, Yu YF, Galvin JJ, Fu QJ, Tao DD. Segregation of competing speech in adults and children with normal hearing and in children with cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:339. [PMID: 34340485 DOI: 10.1121/10.0005597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 06/22/2021] [Indexed: 06/13/2023]
Abstract
Children with normal hearing (CNH) have greater difficulty segregating competing speech than do adults with normal hearing (ANH). Children with cochlear implants (CCI) have greater difficulty segregating competing speech than do CNH. In the present study, speech reception thresholds (SRTs) in competing speech were measured in Chinese Mandarin-speaking ANH, CNH, and CCIs. Target sentences were produced by a male Mandarin-speaking talker. Maskers were time-forward or -reversed sentences produced by a native Mandarin-speaking male (different from the target) or female or a non-native English-speaking male. The SRTs were lowest (best) for the ANH group, followed by the CNH and CCI groups. The masking release (MR) was comparable between the ANH and CNH group, but much poorer in the CCI group. The temporal properties differed between the native and non-native maskers and between forward and reversed speech. The temporal properties of the maskers were significantly associated with the SRTs for the CCI and CNH groups but not for the ANH group. Whereas the temporal properties of the maskers were significantly associated with the MR for all three groups, the association was stronger for the CCI and CNH groups than for the ANH group.
Collapse
Affiliation(s)
- Ji-Sheng Liu
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou 215006, China
| | - Yang-Wenyi Liu
- Department of Otology and Skull Base Surgery, Eye Ear Nose and Throat Hospital, Fudan University, Shanghai 200031, China
| | - Ya-Feng Yu
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou 215006, China
| | - John J Galvin
- House Ear Institute, Los Angeles, California 90057, USA
| | - Qian-Jie Fu
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, California 90095, USA
| | - Duo-Duo Tao
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou 215006, China
| |
Collapse
|
8
|
Wang X, Xu L. Speech perception in noise: Masking and unmasking. J Otol 2021; 16:109-119. [PMID: 33777124 PMCID: PMC7985001 DOI: 10.1016/j.joto.2020.12.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/03/2020] [Accepted: 12/06/2020] [Indexed: 11/23/2022] Open
Abstract
Speech perception is essential for daily communication. Background noise or concurrent talkers, on the other hand, can make it challenging for listeners to track the target speech (i.e., cocktail party problem). The present study reviews and compares existing findings on speech perception and unmasking in cocktail party listening environments in English and Mandarin Chinese. The review starts with an introduction section followed by related concepts of auditory masking. The next two sections review factors that release speech perception from masking in English and Mandarin Chinese, respectively. The last section presents an overall summary of the findings with comparisons between the two languages. Future research directions with respect to the difference in literature on the reviewed topic between the two languages are also discussed.
Collapse
Affiliation(s)
- Xianhui Wang
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| | - Li Xu
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| |
Collapse
|
9
|
Kaplan EC, Wagner AE, Toffanin P, Başkent D. Do Musicians and Non-musicians Differ in Speech-on-Speech Processing? Front Psychol 2021; 12:623787. [PMID: 33679539 PMCID: PMC7931613 DOI: 10.3389/fpsyg.2021.623787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 01/21/2021] [Indexed: 12/18/2022] Open
Abstract
Earlier studies have shown that musically trained individuals may have a benefit in adverse listening situations when compared to non-musicians, especially in speech-on-speech perception. However, the literature provides mostly conflicting results. In the current study, by employing different measures of spoken language processing, we aimed to test whether we could capture potential differences between musicians and non-musicians in speech-on-speech processing. We used an offline measure of speech perception (sentence recall task), which reveals a post-task response, and online measures of real time spoken language processing: gaze-tracking and pupillometry. We used stimuli of comparable complexity across both paradigms and tested the same groups of participants. In the sentence recall task, musicians recalled more words correctly than non-musicians. In the eye-tracking experiment, both groups showed reduced fixations to the target and competitor words' images as the level of speech maskers increased. The time course of gaze fixations to the competitor did not differ between groups in the speech-in-quiet condition, while the time course dynamics did differ between groups as the two-talker masker was added to the target signal. As the level of two-talker masker increased, musicians showed reduced lexical competition as indicated by the gaze fixations to the competitor. The pupil dilation data showed differences mainly in one target-to-masker ratio. This does not allow to draw conclusions regarding potential differences in the use of cognitive resources between groups. Overall, the eye-tracking measure enabled us to observe that musicians may be using a different strategy than non-musicians to attain spoken word recognition as the noise level increased. However, further investigation with more fine-grained alignment between the processes captured by online and offline measures is necessary to establish whether musicians differ due to better cognitive control or sound processing.
Collapse
Affiliation(s)
- Elif Canseza Kaplan
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, Netherlands.,Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, Netherlands
| | - Anita E Wagner
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Paolo Toffanin
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, Netherlands.,Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, Netherlands
| |
Collapse
|
10
|
Thomas M, Galvin JJ, Fu QJ. Interactions among talker sex, masker number, and masker intelligibility in speech-on-speech recognition. JASA EXPRESS LETTERS 2021; 1:015203. [PMID: 33589889 PMCID: PMC7850016 DOI: 10.1121/10.0003051] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 12/16/2020] [Indexed: 05/27/2023]
Abstract
In competing speech, recognition of target speech may be limited by the number and characteristics of maskers, which produce energetic, envelope, and/or informational masking. In this study, speech recognition thresholds (SRTs) were measured with one, two, or four maskers. The target and masker sex was the same or different, and SRTs were measured with time-forward or time-reversed maskers. SRTs were significantly affected by target-masker sex differences with time-forward maskers, but not with time-reversed maskers. The multi-masker penalty was much greater with time-reversed maskers than with time-forward maskers when there were more than two talkers.
Collapse
Affiliation(s)
- Mathew Thomas
- Department of Head and Neck Surgery, David Geffen School of Medicine, 10833 Le Conte Avenue, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - John J Galvin
- House Ear Institute, 2100 West 3rd Street, Suite 111, Los Angeles, California 90057, USA , ,
| | - Qian-Jie Fu
- Department of Head and Neck Surgery, David Geffen School of Medicine, 10833 Le Conte Avenue, University of California, Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
11
|
Lutfi RA, Rodriguez B, Lee J, Pastore T. A test of model classes accounting for individual differences in the cocktail-party effect. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:4014. [PMID: 33379927 PMCID: PMC7775115 DOI: 10.1121/10.0002961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 11/06/2020] [Accepted: 12/03/2020] [Indexed: 06/12/2023]
Abstract
Listeners differ widely in the ability to follow the speech of a single talker in a noisy crowd-what is called the cocktail-party effect. Differences may arise for any one or a combination of factors associated with auditory sensitivity, selective attention, working memory, and decision making required for effective listening. The present study attempts to narrow the possibilities by grouping explanations into model classes based on model predictions for the types of errors that distinguish better from poorer performing listeners in a vowel segregation and talker identification task. Two model classes are considered: those for which the errors are predictably tied to the voice variation of talkers (decision weight models) and those for which the errors occur largely independently of this variation (internal noise models). Regression analyses of trial-by-trial responses, for different tasks and task demands, show overwhelmingly that the latter type of error is responsible for the performance differences among listeners. The results are inconsistent with models that attribute the performance differences to differences in the reliance listeners place on relevant voice features in this decision. The results are consistent instead with models for which largely stimulus-independent, stochastic processes cause information loss at different stages of auditory processing.
Collapse
Affiliation(s)
- Robert A Lutfi
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - Briana Rodriguez
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - Jungmee Lee
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA
| | - Torben Pastore
- Spatial Hearing Lab, College of Health Solutions, Arizona State University, Tempe, Arizona 85281, USA
| |
Collapse
|
12
|
Chen B, Shi Y, Zhang L, Sun Z, Li Y, Gopen Q, Fu QJ. Masking Effects in the Perception of Multiple Simultaneous Talkers in Normal-Hearing and Cochlear Implant Listeners. Trends Hear 2020; 24:2331216520916106. [PMID: 32324486 PMCID: PMC7180303 DOI: 10.1177/2331216520916106] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
For normal-hearing (NH) listeners, monaural factors, such as voice pitch
cues, may play an important role in the segregation of speech signals
in multitalker environments. However, cochlear implant (CI) users
experience difficulties in segregating speech signals in multitalker
environments in part due to the coarse spectral resolution. The
present study examined how the vocal characteristics of the target and
masking talkers influence listeners’ ability to extract information
from a target phrase in a multitalker environment. Speech recognition
thresholds (SRTs) were measured with one, two, or four masker talkers
for different combinations of target-masker vocal characteristics in
10 adult Mandarin-speaking NH listeners and 12 adult Mandarin-speaking
CI users. The results showed that CI users performed significantly
poorer than NH listeners in the presence of competing talkers. As the
number of masker talkers increased, the mean SRTs significantly
worsened from –22.0 dB to –5.2 dB for NH listeners but significantly
improved from 5.9 dB to 2.8 dB for CI users. The results suggest that
the flattened peaks and valleys with increased numbers of competing
talkers may reduce NH listeners’ ability to use dips in the spectral
and temporal envelopes that allow for “glimpses” of the target speech.
However, the flattened temporal envelope of the resultant masker
signals may be less disruptive to the amplitude contour of the target
speech, which is important for Mandarin-speaking CI users’ lexical
tone recognition. The amount of masking release was further estimated
by comparing SRTs between the same-sex maskers and the different-sex
maskers. There was a large amount of masking release in NH adults
(12 dB) and a small but significant amount of masking release in CI
adults (2 dB). These results suggest that adult CI users may
significantly benefit from voice pitch differences between target and
masker speech.
Collapse
Affiliation(s)
- Biao Chen
- Department of Otolaryngology, Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Ministry of Education of China
| | - Ying Shi
- Department of Otolaryngology, Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Ministry of Education of China
| | - Lifang Zhang
- Department of Otolaryngology, Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Ministry of Education of China
| | - Zhiming Sun
- Department of Otolaryngology, Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Ministry of Education of China
| | - Yongxin Li
- Department of Otolaryngology, Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Ministry of Education of China
| | - Quinton Gopen
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California
| | - Qian-Jie Fu
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California
| |
Collapse
|
13
|
King G, Corbin NE, Leibold LJ, Buss E. Spatial Release from Masking Using Clinical Corpora: Sentence Recognition in a Colocated or Spatially Separated Speech Masker. J Am Acad Audiol 2019; 31:271-276. [PMID: 31589139 DOI: 10.3766/jaaa.19018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
BACKGROUND Speech recognition in complex multisource environments is challenging, particularly for listeners with hearing loss. One source of difficulty is the reduced ability of listeners with hearing loss to benefit from spatial separation of the target and masker, an effect called spatial release from masking (SRM). Despite the prevalence of complex multisource environments in everyday life, SRM is not routinely evaluated in the audiology clinic. PURPOSE The purpose of this study was to demonstrate the feasibility of assessing SRM in adults using widely available tests of speech-in-speech recognition that can be conducted using standard clinical equipment. RESEARCH DESIGN Participants were 22 young adults with normal hearing. The task was masked sentence recognition, using each of five clinically available corpora with speech maskers. The target always sounded like it originated from directly in front of the listener, and the masker either sounded like it originated from the front (colocated with the target) or from the side (separated from the target). In the real spatial manipulation conditions, source location was manipulated by routing the target and masker to either a single speaker or to two speakers: one directly in front of the participant, and one mounted in an adjacent corner, 90° to the right. In the perceived spatial separation conditions, the target and masker were presented from both speakers with delays that made them sound as if they were either colocated or separated. RESULTS With real spatial manipulations, the mean SRM ranged from 7.1 to 11.4 dB, depending on the speech corpus. With perceived spatial manipulations, the mean SRM ranged from 1.8 to 3.1 dB. Whereas real separation improves the signal-to-noise ratio in the ear contralateral to the masker, SRM in the perceived spatial separation conditions is based solely on interaural timing cues. CONCLUSIONS The finding of robust SRM with widely available speech corpora supports the feasibility of measuring this important aspect of hearing in the audiology clinic. The finding of a small but significant SRM in the perceived spatial separation conditions suggests that modified materials could be used to evaluate the use of interaural timing cues specifically.
Collapse
Affiliation(s)
- Grant King
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill, School of Medicine, Chapel Hill, NC
| | - Nicole E Corbin
- Division of Speech and Hearing Sciences, Department of Allied Health Sciences, University of North Carolina at Chapel Hill, School of Medicine, Chapel Hill, NC
| | - Lori J Leibold
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill, School of Medicine, Chapel Hill, NC
| |
Collapse
|
14
|
Calandruccio L, Wasiuk PA, Buss E, Leibold LJ, Kong J, Holmes A, Oleson J. The effect of target/masker fundamental frequency contour similarity on masked-speech recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1065. [PMID: 31472562 PMCID: PMC6690832 DOI: 10.1121/1.5121314] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 07/19/2019] [Accepted: 07/23/2019] [Indexed: 05/20/2023]
Abstract
Greater informational masking is observed when the target and masker speech are more perceptually similar. Fundamental frequency (f0) contour, or the dynamic movement of f0, is thought to provide cues for segregating target speech presented in a speech masker. Most of the data demonstrating this effect have been collected using digitally modified stimuli. Less work has been done exploring the role of f0 contour for speech-in-speech recognition when all of the stimuli have been produced naturally. The goal of this project was to explore the importance of target and masker f0 contour similarity by manipulating the speaking style of talkers producing the target and masker speech streams. Sentence recognition thresholds were evaluated for target and masker speech that was produced with either flat, normal, or exaggerated speaking styles; performance was also measured in speech spectrum shaped noise and for conditions in which the stimuli were processed through an ideal-binary mask. Results confirmed that similarities in f0 contour depth elevated speech-in-speech recognition thresholds; however, when the target and masker had similar contour depths, targets with normal f0 contours were more resistant to masking than targets with flat or exaggerated contours. Differences in energetic masking across stimuli cannot account for these results.
Collapse
Affiliation(s)
- Lauren Calandruccio
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Peter A Wasiuk
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | - Lori J Leibold
- Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Jessica Kong
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Ann Holmes
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Jacob Oleson
- Department of Biostatistics, University of Iowa, Iowa City, Iowa 52246, USA
| |
Collapse
|
15
|
Best V, Swaminathan J, Kopčo N, Roverud E, Shinn-Cunningham B. A "Buildup" of Speech Intelligibility in Listeners With Normal Hearing and Hearing Loss. Trends Hear 2019; 22:2331216518807519. [PMID: 30353783 PMCID: PMC6201174 DOI: 10.1177/2331216518807519] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The perception of simple auditory mixtures is known to evolve over time. For
instance, a common example of this is the “buildup” of stream segregation that
is observed for sequences of tones alternating in pitch. Yet very little is
known about how the perception of more complicated auditory scenes, such as
multitalker mixtures, changes over time. Previous data are consistent with the
idea that the ability to segregate a target talker from competing sounds
improves rapidly when stable cues are available, which leads to improvements in
speech intelligibility. This study examined the time course of this buildup in
listeners with normal and impaired hearing. Five simultaneous sequences of
digits, varying in length from three to six digits, were presented from five
locations in the horizontal plane. A synchronized visual cue at one location
indicated which sequence was the target on each trial. We observed a buildup in
digit identification performance, driven primarily by reductions in confusions
between the target and the maskers, that occurred over the course of three to
four digits. Performance tended to be poorer in listeners with hearing loss;
however, there was only weak evidence that the buildup was diminished or slowed
in this group.
Collapse
Affiliation(s)
- Virginia Best
- 1 Department of Speech, Language and Hearing Sciences, Boston University, MA, USA
| | | | - Norbert Kopčo
- 3 Faculty of Science, Institute of Computer Science, P. J. Safarik University, Kosice, Slovakia
| | - Elin Roverud
- 1 Department of Speech, Language and Hearing Sciences, Boston University, MA, USA
| | | |
Collapse
|
16
|
Kidd G, Mason CR, Best V, Roverud E, Swaminathan J, Jennings T, Clayton K, Steven Colburn H. Determining the energetic and informational components of speech-on-speech masking in listeners with sensorineural hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:440. [PMID: 30710924 PMCID: PMC6347574 DOI: 10.1121/1.5087555] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 11/19/2018] [Accepted: 12/18/2018] [Indexed: 05/20/2023]
Abstract
The ability to identify the words spoken by one talker masked by two or four competing talkers was tested in young-adult listeners with sensorineural hearing loss (SNHL). In a reference/baseline condition, masking speech was colocated with target speech, target and masker talkers were female, and the masker was intelligible. Three comparison conditions included replacing female masker talkers with males, time-reversal of masker speech, and spatial separation of sources. All three variables produced significant release from masking. To emulate energetic masking (EM), stimuli were subjected to ideal time-frequency segregation retaining only the time-frequency units where target energy exceeded masker energy. Subjects were then tested with these resynthesized "glimpsed stimuli." For either two or four maskers, thresholds only varied about 3 dB across conditions suggesting that EM was roughly equal. Compared to normal-hearing listeners from an earlier study [Kidd, Mason, Swaminathan, Roverud, Clayton, and Best, J. Acoust. Soc. Am. 140, 132-144 (2016)], SNHL listeners demonstrated both greater energetic and informational masking as well as higher glimpsed thresholds. Individual differences were correlated across masking release conditions suggesting that listeners could be categorized according to their general ability to solve the task. Overall, both peripheral and central factors appear to contribute to the higher thresholds for SNHL listeners.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Elin Roverud
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Jayaganesh Swaminathan
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Todd Jennings
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Kameron Clayton
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - H Steven Colburn
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
17
|
Calandruccio L, Buss E, Bencheck P, Jett B. Does the semantic content or syntactic regularity of masker speech affect speech-on-speech recognition? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:3289. [PMID: 30599661 PMCID: PMC6786886 DOI: 10.1121/1.5081679] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 11/07/2018] [Accepted: 11/09/2018] [Indexed: 05/30/2023]
Abstract
Speech-on-speech recognition differs substantially across stimuli, but it is unclear what role linguistic features of the masker play in this variability. The linguistic similarity hypothesis suggests similarity between sentence-level semantic content of the target and masker speech increases masking. Sentence recognition in a two-talker masker was evaluated with respect to semantic content and syntactic structure of the masker (experiment 1) and linguistic similarity of the target and masker (experiment 2). Target and masker sentences were semantically meaningful or anomalous. Masker syntax was varied or the same across sentences. When other linguistic features of the masker were controlled, variability in syntactic structure across masker tokens was only relevant when the masker was played continuously (as opposed to gated); when played continuously, sentence-recognition thresholds were poorer with variable than consistent masker syntax, but this effect was small (0.5 dB). When the syntactic structure of the masker was held constant, semantic meaningfulness of the masker did not increase masking, and at times performance was better for the meaningful than the anomalous masker. These data indicate that sentence-level semantic content of the masker speech does not influence speech-on-speech masking. Further, no evidence that similarities between target/masker sentence-level semantic content increases masking was found.
Collapse
Affiliation(s)
- Lauren Calandruccio
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Head/Neck Surgery and Otolaryngology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Penelope Bencheck
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Brandi Jett
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|