1
|
Oh Y, Friggle P, Kinder J, Tilbrook G, Bridges SE. Effects of presentation level on speech-on-speech masking by voice-gender difference and spatial separation between talkers. Front Neurosci 2023; 17:1282764. [PMID: 38192513 PMCID: PMC10773857 DOI: 10.3389/fnins.2023.1282764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 11/30/2023] [Indexed: 01/10/2024] Open
Abstract
Many previous studies have reported that speech segregation performance in multi-talker environments can be enhanced by two major acoustic cues: (1) voice-characteristic differences between talkers; (2) spatial separation between talkers. Here, the improvement they can provide for speech segregation is referred to as "release from masking." The goal of this study was to investigate how masking release performance with two cues is affected by various target presentation levels. Sixteen normal-hearing listeners participated in the speech recognition in noise experiment. Speech-on-speech masking performance was measured as the threshold target-to-masker ratio needed to understand a target talker in the presence of either same- or different-gender masker talkers to manipulate the voice-gender difference cue. These target-masker gender combinations were tested with five spatial configurations (maskers co-located or 15°, 30°, 45°, and 60° symmetrically spatially separated from the target) to manipulate the spatial separation cue. In addition, those conditions were repeated at three target presentation levels (30, 40, and 50 dB sensation levels). Results revealed that the amount of masking release by either voice-gender difference or spatial separation cues was significantly affected by the target level, especially at the small target-masker spatial separation (±15°). Further, the results showed that the intersection points between two masking release types (equal perceptual weighting) could be varied by the target levels. These findings suggest that the perceptual weighting of masking release from two cues is non-linearly related to the target levels. The target presentation level could be one major factor associated with masking release performance in normal-hearing listeners.
Collapse
Affiliation(s)
- Yonghee Oh
- Department of Otolaryngology-Head and Neck Surgery and Communicative Disorders, University of Louisville, Louisville, KY, United States
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States
| | - Phillip Friggle
- Department of Otolaryngology-Head and Neck Surgery and Communicative Disorders, University of Louisville, Louisville, KY, United States
| | - Josephine Kinder
- Department of Otolaryngology-Head and Neck Surgery and Communicative Disorders, University of Louisville, Louisville, KY, United States
| | - Grace Tilbrook
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States
| | - Sarah E. Bridges
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States
| |
Collapse
|
2
|
Roverud E, Villard S, Kidd G. Strength of target source segregation cues affects the outcome of speech-on-speech masking experiments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2780. [PMID: 37140176 PMCID: PMC10319449 DOI: 10.1121/10.0019307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 04/11/2023] [Accepted: 04/14/2023] [Indexed: 05/05/2023]
Abstract
In speech-on-speech listening experiments, some means for designating which talker is the "target" must be provided for the listener to perform better than chance. However, the relative strength of the segregation variables designating the target could affect the results of the experiment. Here, we examine the interaction of two source segregation variables-spatial separation and talker gender differences-and demonstrate that the relative strengths of these cues may affect the interpretation of the results. Participants listened to sentence pairs spoken by different-gender target and masker talkers, presented naturally or vocoded (degrading gender cues), either colocated or spatially separated. Target and masker words were temporally interleaved to eliminate energetic masking in either an every-other-word or randomized order of presentation. Results showed that the order of interleaving had no effect on recall performance. For natural speech with strong talker gender cues, spatial separation of sources yielded no improvement in performance. For vocoded speech with degraded talker gender cues, performance improved significantly with spatial separation of sources. These findings reveal that listeners may shift among target source segregation cues contingent on cue viability. Finally, performance was poor when the target was designated after stimulus presentation, indicating strong reliance on the cues.
Collapse
Affiliation(s)
- Elin Roverud
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Sarah Villard
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Gerald Kidd
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
3
|
Oh Y, Srinivasan NK, Hartling CL, Gallun FJ, Reiss LAJ. Differential Effects of Binaural Pitch Fusion Range on the Benefits of Voice Gender Differences in a "Cocktail Party" Environment for Bimodal and Bilateral Cochlear Implant Users. Ear Hear 2023; 44:318-329. [PMID: 36395512 PMCID: PMC9957805 DOI: 10.1097/aud.0000000000001283] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
OBJECTIVES Some cochlear implant (CI) users are fitted with a CI in each ear ("bilateral"), while others have a CI in one ear and a hearing aid in the other ("bimodal"). Presently, evaluation of the benefits of bilateral or bimodal CI fitting does not take into account the integration of frequency information across the ears. This study tests the hypothesis that CI listeners, especially bimodal CI users, with a more precise integration of frequency information across ears ("sharp binaural pitch fusion") will derive greater benefit from voice gender differences in a multi-talker listening environment. DESIGN Twelve bimodal CI users and twelve bilateral CI users participated. First, binaural pitch fusion ranges were measured using the simultaneous, dichotic presentation of reference and comparison stimuli (electric pulse trains for CI ears and acoustic tones for HA ears) in opposite ears, with reference stimuli fixed and comparison stimuli varied in frequency/electrode to find the range perceived as a single sound. Direct electrical stimulation was used in implanted ears through the research interface, which allowed selective stimulation of one electrode at a time, and acoustic stimulation was used in the non-implanted ears through the headphone. Second, speech-on-speech masking performance was measured to estimate masking release by voice gender difference between target and maskers (VGRM). The VGRM was calculated as the difference in speech recognition thresholds of target sounds in the presence of same-gender or different-gender maskers. RESULTS Voice gender differences between target and masker talkers improved speech recognition performance for the bimodal CI group, but not the bilateral CI group. The bimodal CI users who benefited the most from voice gender differences were those who had the narrowest range of acoustic frequencies that fused into a single sound with stimulation from a single electrode from the CI in the opposite ear. There was no similar voice gender difference benefit of narrow binaural fusion range for the bilateral CI users. CONCLUSIONS The findings suggest that broad binaural fusion reduces the acoustical information available for differentiating individual talkers in bimodal CI users, but not for bilateral CI users. In addition, for bimodal CI users with narrow binaural fusion who benefit from voice gender differences, bilateral implantation could lead to a loss of that benefit and impair their ability to selectively attend to one talker in the presence of multiple competing talkers. The results suggest that binaural pitch fusion, along with an assessment of residual hearing and other factors, could be important for assessing bimodal and bilateral CI users.
Collapse
Affiliation(s)
- Yonghee Oh
- Department of Otolaryngology - Head and Neck Surgery and Communicative Disorders, University of Louisville, Louisville, Kentucky 40202, USA
| | - Nirmal Kumar Srinivasan
- Department of Speech-Language Pathology & Audiology, Towson University, Towson, Maryland 21252, USA
| | - Curtis L. Hartling
- Department of Otolaryngology, Oregon Health and Science University, Portland, Oregon 97239, USA
| | - Frederick J. Gallun
- National Center for Rehabilitative Auditory Research, VA Portland Health Care System, Portland, Oregon 97239, USA
| | - Lina A. J. Reiss
- Department of Otolaryngology, Oregon Health and Science University, Portland, Oregon 97239, USA
| |
Collapse
|
4
|
Oh Y, Kalpin N, Hunter J, Schwalm M. The impact of temporally coherent visual and vibrotactile cues on speech recognition in noise. JASA EXPRESS LETTERS 2023; 3:025203. [PMID: 36858994 DOI: 10.1121/10.0017326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Inputs delivered to different sensory organs provide us with complementary speech information about the environment. The goal of this study was to establish which multisensory characteristics can facilitate speech recognition in noise. The major finding is that the tracking of temporal cues of visual/tactile speech synced with auditory speech can play a key role in speech-in-noise performance. This suggests that multisensory interactions are fundamentally important for speech recognition ability in noisy environments, and they require salient temporal cues. The amplitude envelope, serving as a reliable temporal cue source, can be applied through different sensory modalities when speech recognition is compromised.
Collapse
Affiliation(s)
- Yonghee Oh
- Department of Otolaryngology-Head and Neck Surgery and Communicative Disorders, University of Louisville, Louisville, Kentucky 40202, USA
| | - Nicole Kalpin
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, Florida 32610, USA , , ,
| | - Jessica Hunter
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, Florida 32610, USA , , ,
| | - Meg Schwalm
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, Florida 32610, USA , , ,
| |
Collapse
|
5
|
Lelo de Larrea-Mancera ES, Solís-Vivanco R, Sánchez-Jimenez Y, Coco L, Gallun FJ, Seitz AR. Development and validation of a Spanish-language spatial release from masking task in a Mexican population. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:316. [PMID: 36732214 PMCID: PMC10162838 DOI: 10.1121/10.0016850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
This study validates a new Spanish-language version of the Coordinate Response Measure (CRM) corpus using a well-established measure of spatial release from masking (SRM). Participants were 96 Spanish-speaking young adults without hearing complaints in Mexico City. To present the Spanish-language SRM test, we created new recordings of the CRM with Spanish-language Translations and updated the freely available app (PART; https://ucrbraingamecenter.github.io/PART_Utilities/) to present materials in Spanish. In addition to SRM, we collected baseline data on a battery of non-speech auditory assessments, including detection of frequency modulations, temporal gaps, and modulated broadband noise in the temporal, spectral, and spectrotemporal domains. Data demonstrate that the newly developed speech and non-speech tasks show similar reliability to an earlier report in English-speaking populations. This study demonstrates an approach by which auditory assessment for clinical and basic research can be extended to Spanish-speaking populations for whom testing platforms are not currently available.
Collapse
Affiliation(s)
| | - Rodolfo Solís-Vivanco
- Laboratory of Cognitive and Clinical Neurophysiology, Instituto Nacional de Neurología y Neurocirugía Manuel Velasco Suárez (INNNMVS), Avenue Insurgentes Sur 3877, La Fama, Tlalpan, Mexico City, CDMX 14269, Mexico
| | | | - Laura Coco
- Department of Otolaryngology, Oregon Health & Science University, Portland, Oregon 97239, USA
| | - Frederick J Gallun
- Department of Otolaryngology, Oregon Health & Science University, Portland, Oregon 97239, USA
| | - Aaron R Seitz
- Department of Psychology, University of California, 900 University Avenue, Riverside, California 92507, USA
| |
Collapse
|
6
|
Ozmeral EJ, Higgins NC. Defining functional spatial boundaries using a spatial release from masking task. JASA EXPRESS LETTERS 2022; 2:124402. [PMID: 36586966 PMCID: PMC9720634 DOI: 10.1121/10.0015356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 11/11/2022] [Indexed: 06/17/2023]
Abstract
The classic spatial release from masking (SRM) task measures speech recognition thresholds for discrete separation angles between a target and masker. Alternatively, this study used a modified SRM task that adaptively measured the spatial-separation angle needed between a continuous male target stream (speech with digits) and two female masker streams to achieve a specific SRM. On average, 20 young normal-hearing listeners needed less spatial separation for 6 dB release than 9 dB release, and the presence of background babble reduced across-listener variability on the paradigm. Future work is needed to better understand the psychometric properties of this adaptive procedure.
Collapse
Affiliation(s)
- Erol J Ozmeral
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA ,
| | - Nathan C Higgins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33620, USA ,
| |
Collapse
|
7
|
Oh Y, Schwalm M, Kalpin N. Multisensory benefits for speech recognition in noisy environments. Front Neurosci 2022; 16:1031424. [PMID: 36340778 PMCID: PMC9630463 DOI: 10.3389/fnins.2022.1031424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 10/03/2022] [Indexed: 11/15/2022] Open
Abstract
A series of our previous studies explored the use of an abstract visual representation of the amplitude envelope cues from target sentences to benefit speech perception in complex listening environments. The purpose of this study was to expand this auditory-visual speech perception to the tactile domain. Twenty adults participated in speech recognition measurements in four different sensory modalities (AO, auditory-only; AV, auditory-visual; AT, auditory-tactile; AVT, auditory-visual-tactile). The target sentences were fixed at 65 dB sound pressure level and embedded within a simultaneous speech-shaped noise masker of varying degrees of signal-to-noise ratios (-7, -5, -3, -1, and 1 dB SNR). The amplitudes of both abstract visual and vibrotactile stimuli were temporally synchronized with the target speech envelope for comparison. Average results showed that adding temporally-synchronized multimodal cues to the auditory signal did provide significant improvements in word recognition performance across all three multimodal stimulus conditions (AV, AT, and AVT), especially at the lower SNR levels of -7, -5, and -3 dB for both male (8-20% improvement) and female (5-25% improvement) talkers. The greatest improvement in word recognition performance (15-19% improvement for males and 14-25% improvement for females) was observed when both visual and tactile cues were integrated (AVT). Another interesting finding in this study is that temporally synchronized abstract visual and vibrotactile stimuli additively stack in their influence on speech recognition performance. Our findings suggest that a multisensory integration process in speech perception requires salient temporal cues to enhance speech recognition ability in noisy environments.
Collapse
Affiliation(s)
- Yonghee Oh
- Department of Otolaryngology-Head and Neck Surgery and Communicative Disorders, University of Louisville, Louisville, KY, United States
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States
| | - Meg Schwalm
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States
| | - Nicole Kalpin
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States
| |
Collapse
|
8
|
Thomas M, Willis S, Galvin JJ, Fu QJ. Effects of tonotopic matching and spatial cues on segregation of competing speech in simulations of bilateral cochlear implants. PLoS One 2022; 17:e0270759. [PMID: 35788202 PMCID: PMC9255761 DOI: 10.1371/journal.pone.0270759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 06/16/2022] [Indexed: 11/18/2022] Open
Abstract
In the clinical fitting of cochlear implants (CIs), the lowest input acoustic frequency is typically much lower than the characteristic frequency associated with the most apical electrode position, due to the limited electrode insertion depth. For bilateral CI users, electrode positions may differ across ears. However, the same acoustic-to-electrode frequency allocation table (FAT) is typically assigned to both ears. As such, bilateral CI users may experience both intra-aural frequency mismatch within each ear and inter-aural mismatch across ears. This inter-aural mismatch may limit the ability of bilateral CI users to take advantage of spatial cues when attempting to segregate competing speech. Adjusting the FAT to tonotopically match the electrode position in each ear (i.e., increasing the low acoustic input frequency) is theorized to reduce this inter-aural mismatch. Unfortunately, this approach may also introduce the loss of acoustic information below the modified input acoustic frequency. The present study explored the trade-off between reduced inter-aural frequency mismatch and low-frequency information loss for segregation of competing speech. Normal-hearing participants were tested while listening to acoustic simulations of bilateral CIs. Speech reception thresholds (SRTs) were measured for target sentences produced by a male talker in the presence of two different male talkers. Masker speech was either co-located with or spatially separated from the target speech. The bilateral CI simulations were produced by 16-channel sinewave vocoders; the simulated insertion depth was fixed in one ear and varied in the other ear, resulting in an inter-aural mismatch of 0, 2, or 6 mm in terms of cochlear place. Two FAT conditions were compared: 1) clinical (200-8000 Hz in both ears), or 2) matched to the simulated insertion depth in each ear. Results showed that SRTs were significantly lower with the matched than with the clinical FAT, regardless of the insertion depth or spatial configuration of the masker speech. The largest improvement in SRTs with the matched FAT was observed when the inter-aural mismatch was largest (6 mm). These results suggest that minimizing inter-aural mismatch with tonotopically matched FATs may benefit bilateral CI users' ability to segregate competing speech despite substantial low-frequency information loss in ears with shallow insertion depths.
Collapse
Affiliation(s)
- Mathew Thomas
- Department of Head and Neck Surgery, David Geffen School of Medicine, UCLA, Los Angeles, CA, United States of America
| | - Shelby Willis
- Department of Head and Neck Surgery, David Geffen School of Medicine, UCLA, Los Angeles, CA, United States of America
| | - John J. Galvin
- House Institute Foundation, Los Angeles, California, United States of America
| | - Qian-Jie Fu
- Department of Head and Neck Surgery, David Geffen School of Medicine, UCLA, Los Angeles, CA, United States of America
- * E-mail:
| |
Collapse
|
9
|
Oh Y, Hartling CL, Srinivasan NK, Diedesch AC, Gallun FJ, Reiss LAJ. Factors underlying masking release by voice-gender differences and spatial separation cues in multi-talker listening environments in listeners with and without hearing loss. Front Neurosci 2022; 16:1059639. [PMID: 36507363 PMCID: PMC9726925 DOI: 10.3389/fnins.2022.1059639] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Accepted: 11/07/2022] [Indexed: 11/24/2022] Open
Abstract
Voice-gender differences and spatial separation are important cues for auditory object segregation. The goal of this study was to investigate the relationship of voice-gender difference benefit to the breadth of binaural pitch fusion, the perceptual integration of dichotic stimuli that evoke different pitches across ears, and the relationship of spatial separation benefit to localization acuity, the ability to identify the direction of a sound source. Twelve bilateral hearing aid (HA) users (age from 30 to 75 years) and eleven normal hearing (NH) listeners (age from 36 to 67 years) were tested in the following three experiments. First, speech-on-speech masking performance was measured as the threshold target-to-masker ratio (TMR) needed to understand a target talker in the presence of either same- or different-gender masker talkers. These target-masker gender combinations were tested with two spatial configurations (maskers co-located or 60° symmetrically spatially separated from the target) in both monaural and binaural listening conditions. Second, binaural pitch fusion range measurements were conducted using harmonic tone complexes around a 200-Hz fundamental frequency. Third, absolute localization acuity was measured using broadband (125-8000 Hz) noise and one-third octave noise bands centered at 500 and 3000 Hz. Voice-gender differences between target and maskers improved TMR thresholds for both listener groups in the binaural condition as well as both monaural (left ear and right ear) conditions, with greater benefit in co-located than spatially separated conditions. Voice-gender difference benefit was correlated with the breadth of binaural pitch fusion in the binaural condition, but not the monaural conditions, ruling out a role of monaural abilities in the relationship between binaural fusion and voice-gender difference benefits. Spatial separation benefit was not significantly correlated with absolute localization acuity. In addition, greater spatial separation benefit was observed in NH listeners than in bilateral HA users, indicating a decreased ability of HA users to benefit from spatial release from masking (SRM). These findings suggest that sharp binaural pitch fusion may be important for maximal speech perception in multi-talker environments for both NH listeners and bilateral HA users.
Collapse
Affiliation(s)
- Yonghee Oh
- Department of Otolaryngology and Communicative Disorders, University of Louisville, Louisville, KY, United States
- National Center for Rehabilitative Auditory Research, VA Portland Health Care System, Portland, OR, United States
- *Correspondence: Yonghee Oh,
| | - Curtis L. Hartling
- Department of Otolaryngology, Oregon Health & Science University, Portland, OR, United States
| | - Nirmal Kumar Srinivasan
- Department of Speech-Language Pathology & Audiology, Towson University, Towson, MD, United States
| | - Anna C. Diedesch
- Department of Communication Sciences and Disorders, Western Washington University, Bellingham, WA, United States
| | - Frederick J. Gallun
- National Center for Rehabilitative Auditory Research, VA Portland Health Care System, Portland, OR, United States
- Department of Otolaryngology, Oregon Health & Science University, Portland, OR, United States
| | - Lina A. J. Reiss
- National Center for Rehabilitative Auditory Research, VA Portland Health Care System, Portland, OR, United States
- Department of Otolaryngology, Oregon Health & Science University, Portland, OR, United States
| |
Collapse
|