1
|
Nagels L, Gaudrain E, Vickers D, Hendriks P, Başkent D. Prelingually Deaf Children With Cochlear Implants Show Better Perception of Voice Cues and Speech in Competing Speech Than Postlingually Deaf Adults With Cochlear Implants. Ear Hear 2024; 45:952-968. [PMID: 38616318 PMCID: PMC11175806 DOI: 10.1097/aud.0000000000001489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 01/10/2024] [Indexed: 04/16/2024]
Abstract
OBJECTIVES Postlingually deaf adults with cochlear implants (CIs) have difficulties with perceiving differences in speakers' voice characteristics and benefit little from voice differences for the perception of speech in competing speech. However, not much is known yet about the perception and use of voice characteristics in prelingually deaf implanted children with CIs. Unlike CI adults, most CI children became deaf during the acquisition of language. Extensive neuroplastic changes during childhood could make CI children better at using the available acoustic cues than CI adults, or the lack of exposure to a normal acoustic speech signal could make it more difficult for them to learn which acoustic cues they should attend to. This study aimed to examine to what degree CI children can perceive voice cues and benefit from voice differences for perceiving speech in competing speech, comparing their abilities to those of normal-hearing (NH) children and CI adults. DESIGN CI children's voice cue discrimination (experiment 1), voice gender categorization (experiment 2), and benefit from target-masker voice differences for perceiving speech in competing speech (experiment 3) were examined in three experiments. The main focus was on the perception of mean fundamental frequency (F0) and vocal-tract length (VTL), the primary acoustic cues related to speakers' anatomy and perceived voice characteristics, such as voice gender. RESULTS CI children's F0 and VTL discrimination thresholds indicated lower sensitivity to differences compared with their NH-age-equivalent peers, but their mean discrimination thresholds of 5.92 semitones (st) for F0 and 4.10 st for VTL indicated higher sensitivity than postlingually deaf CI adults with mean thresholds of 9.19 st for F0 and 7.19 st for VTL. Furthermore, CI children's perceptual weighting of F0 and VTL cues for voice gender categorization closely resembled that of their NH-age-equivalent peers, in contrast with CI adults. Finally, CI children had more difficulties in perceiving speech in competing speech than their NH-age-equivalent peers, but they performed better than CI adults. Unlike CI adults, CI children showed a benefit from target-masker voice differences in F0 and VTL, similar to NH children. CONCLUSION Although CI children's F0 and VTL voice discrimination scores were overall lower than those of NH children, their weighting of F0 and VTL cues for voice gender categorization and their benefit from target-masker differences in F0 and VTL resembled that of NH children. Together, these results suggest that prelingually deaf implanted CI children can effectively utilize spectrotemporally degraded F0 and VTL cues for voice and speech perception, generally outperforming postlingually deaf CI adults in comparable tasks. These findings underscore the presence of F0 and VTL cues in the CI signal to a certain degree and suggest other factors contributing to the perception challenges faced by CI adults.
Collapse
Affiliation(s)
- Leanne Nagels
- Center for Language and Cognition Groningen (CLCG), University of Groningen, Groningen, The Netherlands
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Groningen, The Netherlands
| | - Etienne Gaudrain
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Groningen, The Netherlands
- CNRS UMR 5292, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Inserm UMRS 1028, Université Claude Bernard Lyon 1, Université de Lyon, Lyon, France
| | - Deborah Vickers
- Cambridge Hearing Group, Sound Lab, Clinical Neurosciences Department, University of Cambridge, Cambridge, United Kingdom
| | - Petra Hendriks
- Center for Language and Cognition Groningen (CLCG), University of Groningen, Groningen, The Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Groningen, The Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Groningen, The Netherlands
- W.J. Kolff Institute for Biomedical Engineering and Materials Science, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
2
|
Lelic D, Nielsen LLA, Pedersen AK, Neher T. Focusing on Positive Listening Experiences Improves Speech Intelligibility in Experienced Hearing Aid Users. Trends Hear 2024; 28:23312165241246616. [PMID: 38656770 PMCID: PMC11044800 DOI: 10.1177/23312165241246616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 03/08/2024] [Accepted: 03/26/2024] [Indexed: 04/26/2024] Open
Abstract
Negativity bias is a cognitive bias that results in negative events being perceptually more salient than positive ones. For hearing care, this means that hearing aid benefits can potentially be overshadowed by adverse experiences. Research has shown that sustaining focus on positive experiences has the potential to mitigate negativity bias. The purpose of the current study was to investigate whether a positive focus (PF) intervention can improve speech-in-noise abilities for experienced hearing aid users. Thirty participants were randomly allocated to a control or PF group (N = 2 × 15). Prior to hearing aid fitting, all participants filled out the short form of the Speech, Spatial and Qualities of Hearing scale (SSQ12) based on their own hearing aids. At the first visit, they were fitted with study hearing aids, and speech-in-noise testing was performed. Both groups then wore the study hearing aids for two weeks and sent daily text messages reporting hours of hearing aid use to an experimenter. In addition, the PF group was instructed to focus on positive listening experiences and to also report them in the daily text messages. After the 2-week trial, all participants filled out the SSQ12 questionnaire based on the study hearing aids and completed the speech-in-noise testing again. Speech-in-noise performance and SSQ12 Qualities score were improved for the PF group but not for the control group. This finding indicates that the PF intervention can improve subjective and objective hearing aid benefits.
Collapse
Affiliation(s)
| | | | | | - Tobias Neher
- Department of Clinical Research, University of Southern Denmark, Odense, Denmark
- Research Unit for ORL – Head & Neck Surgery and Audiology, Odense University Hospital & University of Southern Denmark, Odense, Denmark
| |
Collapse
|
3
|
Cohn M, Barreda S, Zellou G. Differences in a Musician's Advantage for Speech-in-Speech Perception Based on Age and Task. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:545-564. [PMID: 36729698 DOI: 10.1044/2022_jslhr-22-00259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
PURPOSE This study investigates the debate that musicians have an advantage in speech-in-noise perception from years of targeted auditory training. We also consider the effect of age on any such advantage, comparing musicians and nonmusicians (age range: 18-66 years), all of whom had normal hearing. We manipulate the degree of fundamental frequency (f o) separation between the competing talkers, as well as use different tasks, to probe attentional differences that might shape a musician's advantage across ages. METHOD Participants (ranging in age from 18 to 66 years) included 29 musicians and 26 nonmusicians. They completed two tasks varying in attentional demands: (a) a selective attention task where listeners identify the target sentence presented with a one-talker interferer (Experiment 1), and (b) a divided attention task where listeners hear two vowels played simultaneously and identify both competing vowels (Experiment 2). In both paradigms, f o separation was manipulated between the two voices (Δf o = 0, 0.156, 0.306, 1, 2, 3 semitones). RESULTS Results show that increasing differences in f o separation lead to higher accuracy on both tasks. Additionally, we find evidence for a musician's advantage across the two studies. In the sentence identification task, younger adult musicians show higher accuracy overall, as well as a stronger reliance on f o separation. Yet, this advantage declines with musicians' age. In the double vowel identification task, musicians of all ages show an across-the-board advantage in detecting two vowels-and use f o separation more to aid in stream separation-but show no consistent difference in double vowel identification. CONCLUSIONS Overall, we find support for a hybrid auditory encoding-attention account of music-to-speech transfer. The musician's advantage includes f o, but the benefit also depends on the attentional demands in the task and listeners' age. Taken together, this study suggests a complex relationship between age, musical experience, and speech-in-speech paradigm on a musician's advantage. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.21956777.
Collapse
Affiliation(s)
- Michelle Cohn
- Phonetics Lab, Department of Linguistics, University of California, Davis
| | - Santiago Barreda
- Phonetics Lab, Department of Linguistics, University of California, Davis
| | - Georgia Zellou
- Phonetics Lab, Department of Linguistics, University of California, Davis
| |
Collapse
|
4
|
Huet MP, Micheyl C, Gaudrain E, Parizet E. Vocal and semantic cues for the segregation of long concurrent speech stimuli in diotic and dichotic listening-The Long-SWoRD test. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1557. [PMID: 35364949 DOI: 10.1121/10.0007225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 10/25/2021] [Indexed: 06/14/2023]
Abstract
It is not always easy to follow a conversation in a noisy environment. To distinguish between two speakers, a listener must mobilize many perceptual and cognitive processes to maintain attention on a target voice and avoid shifting attention to the background noise. The development of an intelligibility task with long stimuli-the Long-SWoRD test-is introduced. This protocol allows participants to fully benefit from the cognitive resources, such as semantic knowledge, to separate two talkers in a realistic listening environment. Moreover, this task also provides the experimenters with a means to infer fluctuations in auditory selective attention. Two experiments document the performance of normal-hearing listeners in situations where the perceptual separability of the competing voices ranges from easy to hard using a combination of voice and binaural cues. The results show a strong effect of voice differences when the voices are presented diotically. In addition, analyzing the influence of the semantic context on the pattern of responses indicates that the semantic information induces a response bias in situations where the competing voices are distinguishable and indistinguishable from one another.
Collapse
Affiliation(s)
- Moïra-Phoebé Huet
- Laboratory of Vibration and Acoustics, National Institute of Applied Sciences, University of Lyon, 20 Avenue Albert Einstein, Villeurbanne, 69100, France
| | | | - Etienne Gaudrain
- Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Centre National de la Recerche Scientifique UMR5292, Institut National de la Santé et de la Recherche Médicale U1028, Université Claude Bernard Lyon 1, Université de Lyon, Centre Hospitalier Le Vinatier, Neurocampus, 95 boulevard Pinel, Bron Cedex, 69675, France
| | - Etienne Parizet
- Laboratory of Vibration and Acoustics, National Institute of Applied Sciences, University of Lyon, 20 Avenue Albert Einstein, Villeurbanne, 69100, France
| |
Collapse
|
5
|
Abstract
OBJECTIVES Individuals with cochlear implants (CIs) show reduced word and auditory emotion recognition abilities relative to their peers with normal hearing. Modern CI processing strategies are designed to preserve acoustic cues requisite for word recognition rather than those cues required for accessing other signal information (e.g., talker gender or emotional state). While word recognition is undoubtedly important for communication, the inaccessibility of this additional signal information in speech may lead to negative social experiences and outcomes for individuals with hearing loss. This study aimed to evaluate whether the emphasis on word recognition preservation in CI processing has unintended consequences on the perception of other talker information, such as emotional state. DESIGN Twenty-four young adult listeners with normal hearing listened to sentences and either reported a target word in each sentence (word recognition task) or selected the emotion of the talker (emotion recognition task) from a list of options (Angry, Calm, Happy, and Sad). Sentences were blocked by task type (emotion recognition versus word recognition) and processing condition (unprocessed versus 8-channel noise vocoder) and presented randomly within the block at three signal-to-noise ratios (SNRs) in a background of speech-shaped noise. Confusion matrices showed the number of errors in emotion recognition by listeners. RESULTS Listeners demonstrated better emotion recognition performance than word recognition performance at the same SNR. Unprocessed speech resulted in higher recognition rates than vocoded stimuli. Recognition performance (for both words and emotions) decreased with worsening SNR. Vocoding speech resulted in a greater negative impact on emotion recognition than it did for word recognition. CONCLUSIONS These data confirm prior work that suggests that in background noise, emotional prosodic information in speech is easier to recognize than word information, even after simulated CI processing. However, emotion recognition may be more negatively impacted by background noise and CI processing than word recognition. Future work could explore CI processing strategies that better encode prosodic information and investigate this effect in individuals with CIs as opposed to vocoded simulation. This study emphasized the need for clinicians to consider not only word recognition but also other aspects of speech that are critical to successful social communication.
Collapse
|
6
|
Morgan SD. Comparing Emotion Recognition and Word Recognition in Background Noise. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1758-1772. [PMID: 33830784 DOI: 10.1044/2021_jslhr-20-00153] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose Word recognition in quiet and in background noise has been thoroughly investigated in previous research to establish segmental speech recognition performance as a function of stimulus characteristics (e.g., audibility). Similar methods to investigate recognition performance for suprasegmental information (e.g., acoustic cues used to make judgments of talker age, sex, or emotional state) have not been performed. In this work, we directly compared emotion and word recognition performance in different levels of background noise to identify psychoacoustic properties of emotion recognition (globally and for specific emotion categories) relative to word recognition. Method Twenty young adult listeners with normal hearing listened to sentences and either reported a target word in each sentence or selected the emotion of the talker from a list of options (angry, calm, happy, and sad) at four signal-to-noise ratios in a background of white noise. Psychometric functions were fit to the recognition data and used to estimate thresholds (midway points on the function) and slopes for word and emotion recognition. Results Thresholds for emotion recognition were approximately 10 dB better than word recognition thresholds, and slopes for emotion recognition were half of those measured for word recognition. Low-arousal emotions had poorer thresholds and shallower slopes than high-arousal emotions, suggesting greater confusion when distinguishing low-arousal emotional speech content. Conclusions Communication of a talker's emotional state continues to be perceptible to listeners in competitive listening environments, even after words are rendered inaudible. The arousal of emotional speech affects listeners' ability to discriminate between emotion categories.
Collapse
Affiliation(s)
- Shae D Morgan
- Department of Otolaryngology - Head and Neck Surgery and Communicative Disorders, University of Louisville, KY
| |
Collapse
|
7
|
Nagels L, Gaudrain E, Vickers D, Hendriks P, Başkent D. School-age children benefit from voice gender cue differences for the perception of speech in competing speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3328. [PMID: 34241121 DOI: 10.1121/10.0004791] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 04/08/2021] [Indexed: 06/13/2023]
Abstract
Differences in speakers' voice characteristics, such as mean fundamental frequency (F0) and vocal-tract length (VTL), that primarily define speakers' so-called perceived voice gender facilitate the perception of speech in competing speech. Perceiving speech in competing speech is particularly challenging for children, which may relate to their lower sensitivity to differences in voice characteristics than adults. This study investigated the development of the benefit from F0 and VTL differences in school-age children (4-12 years) for separating two competing speakers while tasked with comprehending one of them and also the relationship between this benefit and their corresponding voice discrimination thresholds. Children benefited from differences in F0, VTL, or both cues at all ages tested. This benefit proportionally remained the same across age, although overall accuracy continued to differ from that of adults. Additionally, children's benefit from F0 and VTL differences and their overall accuracy were not related to their discrimination thresholds. Hence, although children's voice discrimination thresholds and speech in competing speech perception abilities develop throughout the school-age years, children already show a benefit from voice gender cue differences early on. Factors other than children's discrimination thresholds seem to relate more closely to their developing speech in competing speech perception abilities.
Collapse
Affiliation(s)
- Leanne Nagels
- Center for Language and Cognition Groningen (CLCG), University of Groningen, Groningen 9712EK, Netherlands
| | - Etienne Gaudrain
- CNRS UMR 5292, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Inserm UMRS 1028, Université Claude Bernard Lyon 1, Université de Lyon, Lyon, France
| | - Deborah Vickers
- Sound Lab, Cambridge Hearing Group, Clinical Neurosciences Department, University of Cambridge, Cambridge CB2 0SZ, United Kingdom
| | - Petra Hendriks
- Center for Language and Cognition Groningen (CLCG), University of Groningen, Groningen 9712EK, Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen 9713GZ, Netherlands
| |
Collapse
|
8
|
Warnecke M, Peng ZE, Litovsky RY. The impact of temporal fine structure and signal envelope on auditory motion perception. PLoS One 2020; 15:e0238125. [PMID: 32822439 PMCID: PMC7446836 DOI: 10.1371/journal.pone.0238125] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 08/10/2020] [Indexed: 02/02/2023] Open
Abstract
The majority of psychoacoustic research investigating sound localization has utilized stationary sources, yet most naturally occurring sounds are in motion, either because the sound source itself moves, or the listener does. In normal hearing (NH) listeners, previous research showed the extent to which sound duration and velocity impact the ability of listeners to detect sound movement. By contrast, little is known about how listeners with hearing impairments perceive moving sounds; the only study to date comparing the performance of NH and bilateral cochlear implant (BiCI) listeners has demonstrated significantly poorer performance on motion detection tasks in BiCI listeners. Cochlear implants, auditory protheses offered to profoundly deaf individuals for access to spoken language, retain the signal envelope (ENV), while discarding temporal fine structure (TFS) of the original acoustic input. As a result, BiCI users do not have access to low-frequency TFS cues, which have previously been shown to be crucial for sound localization in NH listeners. Instead, BiCI listeners seem to rely on ENV cues for sound localization, especially level cues. Given that NH and BiCI listeners differentially utilize ENV and TFS information, the present study aimed to investigate the usefulness of these cues for auditory motion perception. We created acoustic chimaera stimuli, which allowed us to test the relative contributions of ENV and TFS to auditory motion perception. Stimuli were either moving or stationary, presented to NH listeners in free field. The task was to track the perceived sound location. We found that removing low-frequency TFS reduces sensitivity to sound motion, and fluctuating speech envelopes strongly biased the judgment of sounds to be stationary. Our findings yield a possible explanation as to why BiCI users struggle to identify sound motion, and provide a first account of cues important to the functional aspect of auditory motion perception.
Collapse
Affiliation(s)
- Michaela Warnecke
- University of Wisconsin-Madison, Waisman Center, Madison, WI, United States of America
| | - Z. Ellen Peng
- University of Wisconsin-Madison, Waisman Center, Madison, WI, United States of America
| | - Ruth Y. Litovsky
- University of Wisconsin-Madison, Waisman Center, Madison, WI, United States of America
| |
Collapse
|
9
|
Morgan SD. Categorical and Dimensional Ratings of Emotional Speech: Behavioral Findings From the Morgan Emotional Speech Set. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:4015-4029. [PMID: 31652413 PMCID: PMC7203525 DOI: 10.1044/2019_jslhr-s-19-0144] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 05/28/2019] [Accepted: 08/02/2019] [Indexed: 06/10/2023]
Abstract
Purpose Emotion classification for auditory stimuli typically employs 1 of 2 approaches (discrete categories or emotional dimensions). This work presents a new emotional speech set, compares these 2 classification methods for emotional speech stimuli, and emphasizes the need to consider the entire communication model (i.e., the talker, message, and listener) when studying auditory emotion portrayal and perception. Method Emotional speech from male and female talkers was evaluated using both categorical and dimensional rating methods. Ten young adult listeners (ages 19-28 years) evaluated stimuli recorded in 4 emotional speaking styles (Angry, Calm, Happy, and Sad). Talker and listener factors were examined for potential influences on emotional ratings using categorical and dimensional rating methods. Listeners rated stimuli by selecting an emotion category, rating the activation and pleasantness, and indicating goodness of category fit. Results Discrete ratings were generally consistent with dimensional ratings for speech, with accuracy for emotion recognition well above chance. As stimuli approached dimensional extremes of activation and pleasantness, listeners were more confident in their category selection, indicative of a hybrid approach to emotion classification. Female talkers were rated as more activated than male talkers, and female listeners gave higher ratings of activation compared to male listeners, confirming gender differences in emotion perception. Conclusion A hybrid model for auditory emotion classification is supported by the data. Talker and listener factors, such as gender, were found to impact the ratings of emotional speech and must be considered alongside stimulus factors in the design of future studies of emotion.
Collapse
Affiliation(s)
- Shae D. Morgan
- Program in Audiology, Department of Otolaryngology Head and Neck Surgery and Communicative Disorders, School of Medicine, University of Louisville, KY
- Department of Communication Sciences and Disorders, University of Utah, Salt Lake City
| |
Collapse
|
10
|
Bidelman GM, Sigley L, Lewis GA. Acoustic noise and vision differentially warp the auditory categorization of speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:60. [PMID: 31370660 PMCID: PMC6786888 DOI: 10.1121/1.5114822] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 06/05/2019] [Accepted: 06/07/2019] [Indexed: 06/10/2023]
Abstract
Speech perception requires grouping acoustic information into meaningful linguistic-phonetic units via categorical perception (CP). Beyond shrinking observers' perceptual space, CP might aid degraded speech perception if categories are more resistant to noise than surface acoustic features. Combining audiovisual (AV) cues also enhances speech recognition, particularly in noisy environments. This study investigated the degree to which visual cues from a talker (i.e., mouth movements) aid speech categorization amidst noise interference by measuring participants' identification of clear and noisy speech (0 dB signal-to-noise ratio) presented in auditory-only or combined AV modalities (i.e., A, A+noise, AV, AV+noise conditions). Auditory noise expectedly weakened (i.e., shallower identification slopes) and slowed speech categorization. Interestingly, additional viseme cues largely counteracted noise-related decrements in performance and stabilized classification speeds in both clear and noise conditions suggesting more precise acoustic-phonetic representations with multisensory information. Results are parsimoniously described under a signal detection theory framework and by a reduction (visual cues) and increase (noise) in the precision of perceptual object representation, which were not due to lapses of attention or guessing. Collectively, findings show that (i) mapping sounds to categories aids speech perception in "cocktail party" environments; (ii) visual cues help lattice formation of auditory-phonetic categories to enhance and refine speech identification.
Collapse
Affiliation(s)
- Gavin M Bidelman
- School of Communication Sciences & Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA
| | - Lauren Sigley
- School of Communication Sciences & Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA
| | - Gwyneth A Lewis
- School of Communication Sciences & Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA
| |
Collapse
|
11
|
Deroche MLD, Gracco VL. Segregation of voices with single or double fundamental frequencies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:847. [PMID: 30823786 DOI: 10.1121/1.5090107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 01/23/2019] [Indexed: 06/09/2023]
Abstract
In cocktail-party situations, listeners can use the fundamental frequency (F0) of a voice to segregate it from competitors, but other cues in speech could help, such as co-modulation of envelopes across frequency or more complex cues related to the semantic/syntactic content of the utterances. For simplicity, this (non-pitch) form of grouping is referred to as "articulatory." By creating a new type of speech with two steady F0s, it was examined how these two forms of segregation compete: articulatory grouping would bind the partials of a double-F0 source together, whereas harmonic segregation would tend to split them in two subsets. In experiment 1, maskers were two same-male sentences. Speech reception thresholds were high in this task (vicinity of 0 dB), and harmonic segregation behaved as though double-F0 stimuli were two independent sources. This was not the case in experiment 2, where maskers were speech-shaped complexes (buzzes). First, double-F0 targets were immune to the masking of a single-F0 buzz matching one of the two target F0s. Second, double-F0 buzzes were particularly effective at masking a single-F0 target matching one of the two buzz F0s. As a conclusion, the strength of F0-segregation appears to depend on whether the masker is speech or not.
Collapse
Affiliation(s)
- Mickael L D Deroche
- Centre for Research on Brain, Language and Music, McGill University, 3640 rue de la Montagne, Montreal, H3G 2A8, Canada
| | - Vincent L Gracco
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, USA
| |
Collapse
|
12
|
Wu M. Effect of F0 contour on perception of Mandarin Chinese speech against masking. PLoS One 2019; 14:e0209976. [PMID: 30605452 PMCID: PMC6317796 DOI: 10.1371/journal.pone.0209976] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 12/14/2018] [Indexed: 11/19/2022] Open
Abstract
Intonation has many perceptually significant functions in language that contribute to speech recognition. This study aims to investigate whether intonation cues affect the unmasking of Mandarin Chinese speech in the presence of interfering sounds. Specifically, intelligibility of multi-tone Mandarin Chinese sentences with maskers consisting of either two-talker speech or steady-state noise was measured in three (flattened, typical, and exaggerated) intonation conditions. Different from most of the previous studies, the present study only manipulate and modify the intonation information but preserve tone information. The results showed that recognition of the final keywords in multi-tone Mandarin Chinese sentences was much better under the original F0 contour condition than the decreased F0 contour or exaggerated F0 contour conditions whenever there was a noise or speech masker, and an exaggerated F0 contour reduced the intelligibility of Mandarin Chinese more under the speech masker condition than that under the noise masker condition. These results suggested that speech in a tone language (Mandarin Chinese) is harder to understand when the intonation is unnatural, even if the tone information is preserved, and an unnatural intonation contour decreases releasing Mandarin Chinese speech from masking, especially in a multi-person talking environment.
Collapse
Affiliation(s)
- Meihong Wu
- School of Information Science and Engineering, Xiamen University, Fujian, China
- * E-mail:
| |
Collapse
|
13
|
Favre-Félix A, Graversen C, Hietkamp RK, Dau T, Lunner T. Improving Speech Intelligibility by Hearing Aid Eye-Gaze Steering: Conditions With Head Fixated in a Multitalker Environment. Trends Hear 2018. [PMCID: PMC6291882 DOI: 10.1177/2331216518814388] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The behavior of a person during a conversation typically involves both auditory and visual attention. Visual attention implies that the person directs his or her eye gaze toward the sound target of interest, and hence, detection of the gaze may provide a steering signal for future hearing aids. The steering could utilize a beamformer or the selection of a specific audio stream from a set of remote microphones. Previous studies have shown that eye gaze can be measured through electrooculography (EOG). To explore the precision and real-time feasibility of the methodology, seven hearing-impaired persons were tested, seated with their head fixed in front of three targets positioned at −30°, 0°, and +30° azimuth. Each target presented speech from the Danish DAT material, which was available for direct input to the hearing aid using head-related transfer functions. Speech intelligibility was measured in three conditions: a reference condition without any steering, a condition where eye gaze was estimated from EOG measures to select the desired audio stream, and an ideal condition with steering based on an eye-tracking camera. The “EOG-steering” improved the sentence correct score compared with the “no-steering” condition, although the performance was still significantly lower than the ideal condition with the eye-tracking camera. In conclusion, eye-gaze steering increases speech intelligibility, although real-time EOG-steering still requires improvements of the signal processing before it is feasible for implementation in a hearing aid.
Collapse
Affiliation(s)
- Antoine Favre-Félix
- Eriksholm Research Centre, Snekkersten, Denmark
- Hearing Systems Group, Department of Electrical Engineering, Danish Technical University, Lyngby, Denmark
| | | | | | - Torsten Dau
- Hearing Systems Group, Department of Electrical Engineering, Danish Technical University, Lyngby, Denmark
| | - Thomas Lunner
- Eriksholm Research Centre, Snekkersten, Denmark
- Hearing Systems Group, Department of Electrical Engineering, Danish Technical University, Lyngby, Denmark
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Linköping University, Sweden
| |
Collapse
|
14
|
Feng T, Chen Q, Xiao Z. Age-Related Differences in the Effects of Masker Cuing on Releasing Chinese Speech From Informational Masking. Front Psychol 2018; 9:1922. [PMID: 30356784 PMCID: PMC6189421 DOI: 10.3389/fpsyg.2018.01922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 09/18/2018] [Indexed: 11/22/2022] Open
Abstract
The aims of the present study were to examine whether familiarity with a masker improves word recognition in speech masking situations and whether there are age-related differences in the effects of masker cuing. Thirty-two older listeners (range = 59–74; mean age = 66.41 years) with high-frequency hearing loss and 32 younger normal-hearing listeners (range = 21–28; mean age = 23.73) participated in this study, all of whom spoke Chinese as their first language. Two experiments were conducted and 16 younger and 16 older listeners were used in each experiment. The masking speech with different content from target speech with syntactically correct but semantically meaningless was a continuous recording of meaningless Chinese sentences spoken by two talkers. The masker level was adjusted to produce signal-to-masker ratios of -12, -8, -4, and 0 dB for the younger participants and -8, -4, 0, and 4 dB for the older participants. Under masker-priming conditions, a priming sentence, spoken by the masker talkers, was presented in quiet three times before a target sentence was presented together with a masker sentence 4 s later. In Experiment 1, using same-sentence masker-priming (identical to the masker sentence), the masker-priming improved the identification of the target sentence for both age groups compared to when no priming was provided. However, the amount of masking release was less in the older adults than in the younger adults. In Experiment 2, two kinds of primes were considered: same-sentence masker-priming, and different-sentence masker-priming (different from the masker sentence in content for each keyword). The results of Experiment 2 showed that both kinds of primes improved the identification of the targets for both age groups. However, the release from speech masking in both priming conditions was less in the older adults than in the younger adults, and the release from speech masking in both age groups was greater with same-sentence masker-priming than with different-sentence masker-priming. These results suggest that both the voice and content cues of a masker could be used to release target speech from maskers in noisy listening conditions. Furthermore, there was an age-related decline in masker-priming-induced release from speech masking.
Collapse
Affiliation(s)
- Tianquan Feng
- College of Teacher Education, Nanjing Normal University, Nanjing, China.,State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Qingrong Chen
- School of Psychology, Nanjing Normal University, Nanjing, China
| | - Zhongdang Xiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
15
|
Bologna WJ, Vaden KI, Ahlstrom JB, Dubno JR. Age effects on perceptual organization of speech: Contributions of glimpsing, phonemic restoration, and speech segregation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:267. [PMID: 30075693 PMCID: PMC6047943 DOI: 10.1121/1.5044397] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In realistic listening environments, speech perception requires grouping together audible fragments of speech, filling in missing information, and segregating the glimpsed target from the background. The purpose of this study was to determine the extent to which age-related difficulties with these tasks can be explained by declines in glimpsing, phonemic restoration, and/or speech segregation. Younger and older adults with normal hearing listened to sentences interrupted with silence or envelope-modulated noise, presented either in quiet or with a competing talker. Older adults were poorer than younger adults at recognizing keywords based on short glimpses but benefited more when envelope-modulated noise filled silent intervals. Recognition declined with a competing talker but this effect did not interact with age. Results of cognitive tasks indicated that faster processing speed and better visual-linguistic closure were predictive of better speech understanding. Taken together, these results suggest that age-related declines in speech recognition may be partially explained by difficulty grouping short glimpses of speech into a coherent message.
Collapse
Affiliation(s)
- William J Bologna
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425, USA
| | - Kenneth I Vaden
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425, USA
| | - Jayne B Ahlstrom
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425, USA
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425, USA
| |
Collapse
|
16
|
Bologna WJ, Vaden KI, Ahlstrom JB, Dubno JR. Age effects on perceptual organization of speech: Contributions of glimpsing, phonemic restoration, and speech segregation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:267. [PMID: 30075693 DOI: 10.5041466/1.5044397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
In realistic listening environments, speech perception requires grouping together audible fragments of speech, filling in missing information, and segregating the glimpsed target from the background. The purpose of this study was to determine the extent to which age-related difficulties with these tasks can be explained by declines in glimpsing, phonemic restoration, and/or speech segregation. Younger and older adults with normal hearing listened to sentences interrupted with silence or envelope-modulated noise, presented either in quiet or with a competing talker. Older adults were poorer than younger adults at recognizing keywords based on short glimpses but benefited more when envelope-modulated noise filled silent intervals. Recognition declined with a competing talker but this effect did not interact with age. Results of cognitive tasks indicated that faster processing speed and better visual-linguistic closure were predictive of better speech understanding. Taken together, these results suggest that age-related declines in speech recognition may be partially explained by difficulty grouping short glimpses of speech into a coherent message.
Collapse
Affiliation(s)
- William J Bologna
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425, USA
| | - Kenneth I Vaden
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425, USA
| | - Jayne B Ahlstrom
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425, USA
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425, USA
| |
Collapse
|
17
|
Tamati TN, Pisoni DB. Non-native listeners' recognition of high-variability speech using PRESTO. J Am Acad Audiol 2018; 25:869-92. [PMID: 25405842 DOI: 10.3766/jaaa.25.9.9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
BACKGROUND Natural variability in speech is a significant challenge to robust successful spoken word recognition. In everyday listening environments, listeners must quickly adapt and adjust to multiple sources of variability in both the signal and listening environments. High-variability speech may be particularly difficult to understand for non-native listeners, who have less experience with the second language (L2) phonological system and less detailed knowledge of sociolinguistic variation of the L2. PURPOSE The purpose of this study was to investigate the effects of high-variability sentences on non-native speech recognition and to explore the underlying sources of individual differences in speech recognition abilities of non-native listeners. RESEARCH DESIGN Participants completed two sentence recognition tasks involving high-variability and low-variability sentences. They also completed a battery of behavioral tasks and self-report questionnaires designed to assess their indexical processing skills, vocabulary knowledge, and several core neurocognitive abilities. STUDY SAMPLE Native speakers of Mandarin (n = 25) living in the United States recruited from the Indiana University community participated in the current study. A native comparison group consisted of scores obtained from native speakers of English (n = 21) in the Indiana University community taken from an earlier study. DATA COLLECTION AND ANALYSIS Speech recognition in high-variability listening conditions was assessed with a sentence recognition task using sentences from PRESTO (Perceptually Robust English Sentence Test Open-Set) mixed in 6-talker multitalker babble. Speech recognition in low-variability listening conditions was assessed using sentences from HINT (Hearing In Noise Test) mixed in 6-talker multitalker babble. Indexical processing skills were measured using a talker discrimination task, a gender discrimination task, and a forced-choice regional dialect categorization task. Vocabulary knowledge was assessed with the WordFam word familiarity test, and executive functioning was assessed with the BRIEF-A (Behavioral Rating Inventory of Executive Function - Adult Version) self-report questionnaire. Scores from the non-native listeners on behavioral tasks and self-report questionnaires were compared with scores obtained from native listeners tested in a previous study and were examined for individual differences. RESULTS Non-native keyword recognition scores were significantly lower on PRESTO sentences than on HINT sentences. Non-native listeners' keyword recognition scores were also lower than native listeners' scores on both sentence recognition tasks. Differences in performance on the sentence recognition tasks between non-native and native listeners were larger on PRESTO than on HINT, although group differences varied by signal-to-noise ratio. The non-native and native groups also differed in the ability to categorize talkers by region of origin and in vocabulary knowledge. Individual non-native word recognition accuracy on PRESTO sentences in multitalker babble at more favorable signal-to-noise ratios was found to be related to several BRIEF-A subscales and composite scores. However, non-native performance on PRESTO was not related to regional dialect categorization, talker and gender discrimination, or vocabulary knowledge. CONCLUSIONS High-variability sentences in multitalker babble were particularly challenging for non-native listeners. Difficulty under high-variability testing conditions was related to lack of experience with the L2, especially L2 sociolinguistic information, compared with native listeners. Individual differences among the non-native listeners were related to weaknesses in core neurocognitive abilities affecting behavioral control in everyday life.
Collapse
Affiliation(s)
- Terrin N Tamati
- Department of Linguistics, Indiana University, Bloomington, IN
| | - David B Pisoni
- Department of Linguistics, Indiana University, Bloomington, IN; Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN; Department of Otolaryngology - Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, IN
| |
Collapse
|
18
|
Best V, Ahlstrom JB, Mason CR, Roverud E, Perrachione TK, Kidd G, Dubno JR. Talker identification: Effects of masking, hearing loss, and age. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1085. [PMID: 29495693 PMCID: PMC5820061 DOI: 10.1121/1.5024333] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 01/24/2018] [Accepted: 01/29/2018] [Indexed: 06/08/2023]
Abstract
The ability to identify who is talking is an important aspect of communication in social situations and, while empirical data are limited, it is possible that a disruption to this ability contributes to the difficulties experienced by listeners with hearing loss. In this study, talker identification was examined under both quiet and masked conditions. Subjects were grouped by hearing status (normal hearing/sensorineural hearing loss) and age (younger/older adults). Listeners first learned to identify the voices of four same-sex talkers in quiet, and then talker identification was assessed (1) in quiet, (2) in speech-shaped, steady-state noise, and (3) in the presence of a single, unfamiliar same-sex talker. Both younger and older adults with hearing loss, as well as older adults with normal hearing, generally performed more poorly than younger adults with normal hearing, although large individual differences were observed in all conditions. Regression analyses indicated that both age and hearing loss were predictors of performance in quiet, and there was some evidence for an additional contribution of hearing loss in the presence of masking. These findings suggest that both hearing loss and age may affect the ability to identify talkers in "cocktail party" situations.
Collapse
Affiliation(s)
- Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Jayne B Ahlstrom
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Elin Roverud
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Gerald Kidd
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| |
Collapse
|
19
|
Helfer KS, Merchant GR, Wasiuk PA. Age-Related Changes in Objective and Subjective Speech Perception in Complex Listening Environments. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:3009-3018. [PMID: 29049601 PMCID: PMC5945070 DOI: 10.1044/2017_jslhr-h-17-0030] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Revised: 05/03/2017] [Accepted: 05/03/2017] [Indexed: 06/07/2023]
Abstract
PURPOSE A frequent complaint by older adults is difficulty communicating in challenging acoustic environments. The purpose of this work was to review and summarize information about how speech perception in complex listening situations changes across the adult age range. METHOD This article provides a review of age-related changes in speech understanding in complex listening environments and summarizes results from several studies conducted in our laboratory. RESULTS Both degree of high frequency hearing loss and cognitive test performance limit individuals' ability to understand speech in difficult listening situations as they age. The performance of middle-aged adults is similar to that of younger adults in the presence of noise maskers, but they experience substantially more difficulty when the masker is 1 or 2 competing speech messages. For the most part, middle-aged participants in studies conducted in our laboratory reported as much self-perceived hearing problems as did older adult participants. CONCLUSIONS Research supports the multifactorial nature of listening in real-world environments. Current audiologic assessment practices are often insufficient to identify the true speech understanding struggles that individuals experience in these situations. This points to the importance of giving weight to patients' self-reported difficulties. PRESENTATION VIDEO http://cred.pubs.asha.org/article.aspx?articleid=2601619.
Collapse
Affiliation(s)
- Karen S. Helfer
- Department of Communication Disorders, University of Massachusetts Amherst
| | | | - Peter A. Wasiuk
- Department of Communication Disorders, University of Massachusetts Amherst
| |
Collapse
|
20
|
Gordon-Salant S, Yeni-Komshian GH, Fitzgibbons PJ, Willison HM, Freund MS. Recognition of asynchronous auditory-visual speech by younger and older listeners: A preliminary study. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:151. [PMID: 28764460 PMCID: PMC5507703 DOI: 10.1121/1.4992026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Revised: 03/20/2017] [Accepted: 06/23/2017] [Indexed: 05/15/2023]
Abstract
This study examined the effects of age and hearing loss on recognition of speech presented when the auditory and visual speech information was misaligned in time (i.e., asynchronous). Prior research suggests that older listeners are less sensitive than younger listeners in detecting the presence of asynchronous speech for auditory-lead conditions, but recognition of speech in auditory-lead conditions has not yet been examined. Recognition performance was assessed for sentences and words presented in the auditory-visual modalities with varying degrees of auditory lead and lag. Detection of auditory-visual asynchrony for sentences was assessed to verify that listeners detected these asynchronies. The listeners were younger and older normal-hearing adults and older hearing-impaired adults. Older listeners (regardless of hearing status) exhibited a significant decline in performance in auditory-lead conditions relative to visual lead, unlike younger listeners whose recognition performance was relatively stable across asynchronies. Recognition performance was not correlated with asynchrony detection. However, one of the two cognitive measures assessed, processing speed, was identified in multiple regression analyses as contributing significantly to the variance in auditory-visual speech recognition scores. The findings indicate that, particularly in auditory-lead conditions, listener age has an impact on the ability to recognize asynchronous auditory-visual speech signals.
Collapse
Affiliation(s)
- Sandra Gordon-Salant
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Grace H Yeni-Komshian
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Peter J Fitzgibbons
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Hannah M Willison
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Maya S Freund
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
21
|
Cohen JI, Gordon-Salant S. The effect of visual distraction on auditory-visual speech perception by younger and older listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:EL470. [PMID: 28599569 PMCID: PMC5724720 DOI: 10.1121/1.4983399] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 03/05/2017] [Accepted: 04/27/2017] [Indexed: 06/07/2023]
Abstract
Visual distractions are present in real-world listening environments, such as conversing in a crowded restaurant. This study examined the impact of visual distractors on younger and older adults' ability to understand auditory-visual (AV) speech in noise. AV speech stimuli were presented with one competing talker and with three different types of visual distractors. SNR50 thresholds for both listener groups were affected by visual distraction; the poorest performance for both groups was the AV + Video condition, and differences across groups were noted for some conditions. These findings suggest that older adults may be more susceptible to irrelevant auditory and visual competition in a real-world environment.
Collapse
Affiliation(s)
- Julie I Cohen
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA ,
| | - Sandra Gordon-Salant
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA ,
| |
Collapse
|
22
|
Wu C, Zheng Y, Li J, Wu H, She S, Liu S, Ning Y, Li L. Brain substrates underlying auditory speech priming in healthy listeners and listeners with schizophrenia. Psychol Med 2017; 47:837-852. [PMID: 27894376 DOI: 10.1017/s0033291716002816] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
BACKGROUND Under 'cocktail party' listening conditions, healthy listeners and listeners with schizophrenia can use temporally pre-presented auditory speech-priming (ASP) stimuli to improve target-speech recognition, even though listeners with schizophrenia are more vulnerable to informational speech masking. METHOD Using functional magnetic resonance imaging, this study searched for both brain substrates underlying the unmasking effect of ASP in 16 healthy controls and 22 patients with schizophrenia, and brain substrates underlying schizophrenia-related speech-recognition deficits under speech-masking conditions. RESULTS In both controls and patients, introducing the ASP condition (against the auditory non-speech-priming condition) not only activated the left superior temporal gyrus (STG) and left posterior middle temporal gyrus (pMTG), but also enhanced functional connectivity of the left STG/pMTG with the left caudate. It also enhanced functional connectivity of the left STG/pMTG with the left pars triangularis of the inferior frontal gyrus (TriIFG) in controls and that with the left Rolandic operculum in patients. The strength of functional connectivity between the left STG and left TriIFG was correlated with target-speech recognition under the speech-masking condition in both controls and patients, but reduced in patients. CONCLUSIONS The left STG/pMTG and their ASP-related functional connectivity with both the left caudate and some frontal regions (the left TriIFG in healthy listeners and the left Rolandic operculum in listeners with schizophrenia) are involved in the unmasking effect of ASP, possibly through facilitating the following processes: masker-signal inhibition, target-speech encoding, and speech production. The schizophrenia-related reduction of functional connectivity between the left STG and left TriIFG augments the vulnerability of speech recognition to speech masking.
Collapse
Affiliation(s)
- C Wu
- School of Psychological and Cognitive Sciences, and Beijing Key Laboratory of Behavior and Mental Health,Key Laboratory on Machine Perception (Ministry of Education),Peking University,Beijing,People's Republic of China
| | - Y Zheng
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - J Li
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - H Wu
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - S She
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - S Liu
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - Y Ning
- The Affiliated Brain Hospital of Guangzhou Medical University,Guangzhou,People's Republic of China
| | - L Li
- School of Psychological and Cognitive Sciences, and Beijing Key Laboratory of Behavior and Mental Health,Key Laboratory on Machine Perception (Ministry of Education),Peking University,Beijing,People's Republic of China
| |
Collapse
|
23
|
Wu C, Zheng Y, Li J, Zhang B, Li R, Wu H, She S, Liu S, Peng H, Ning Y, Li L. Activation and Functional Connectivity of the Left Inferior Temporal Gyrus during Visual Speech Priming in Healthy Listeners and Listeners with Schizophrenia. Front Neurosci 2017; 11:107. [PMID: 28360829 PMCID: PMC5350153 DOI: 10.3389/fnins.2017.00107] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 02/20/2017] [Indexed: 11/13/2022] Open
Abstract
Under a "cocktail-party" listening condition with multiple-people talking, compared to healthy people, people with schizophrenia benefit less from the use of visual-speech (lipreading) priming (VSP) cues to improve speech recognition. The neural mechanisms underlying the unmasking effect of VSP remain unknown. This study investigated the brain substrates underlying the unmasking effect of VSP in healthy listeners and the schizophrenia-induced changes in the brain substrates. Using functional magnetic resonance imaging, brain activation and functional connectivity for the contrasts of the VSP listening condition vs. the visual non-speech priming (VNSP) condition were examined in 16 healthy listeners (27.4 ± 8.6 years old, 9 females and 7 males) and 22 listeners with schizophrenia (29.0 ± 8.1 years old, 8 females and 14 males). The results showed that in healthy listeners, but not listeners with schizophrenia, the VSP-induced activation (against the VNSP condition) of the left posterior inferior temporal gyrus (pITG) was significantly correlated with the VSP-induced improvement in target-speech recognition against speech masking. Compared to healthy listeners, listeners with schizophrenia showed significantly lower VSP-induced activation of the left pITG and reduced functional connectivity of the left pITG with the bilateral Rolandic operculum, bilateral STG, and left insular. Thus, the left pITG and its functional connectivity may be the brain substrates related to the unmasking effect of VSP, assumedly through enhancing both the processing of target visual-speech signals and the inhibition of masking-speech signals. In people with schizophrenia, the reduced unmasking effect of VSP on speech recognition may be associated with a schizophrenia-related reduction of VSP-induced activation and functional connectivity of the left pITG.
Collapse
Affiliation(s)
- Chao Wu
- Beijing Key Laboratory of Behavior and Mental Health, Key Laboratory on Machine Perception, Ministry of Education, School of Psychological and Cognitive Sciences, Peking UniversityBeijing, China; School of Life Sciences, Peking UniversityBeijing, China; School of Psychology, Beijing Normal UniversityBeijing, China
| | - Yingjun Zheng
- The Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital) Guangzhou, China
| | - Juanhua Li
- The Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital) Guangzhou, China
| | - Bei Zhang
- The Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital) Guangzhou, China
| | - Ruikeng Li
- The Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital) Guangzhou, China
| | - Haibo Wu
- The Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital) Guangzhou, China
| | - Shenglin She
- The Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital) Guangzhou, China
| | - Sha Liu
- The Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital) Guangzhou, China
| | - Hongjun Peng
- The Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital) Guangzhou, China
| | - Yuping Ning
- The Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital) Guangzhou, China
| | - Liang Li
- Beijing Key Laboratory of Behavior and Mental Health, Key Laboratory on Machine Perception, Ministry of Education, School of Psychological and Cognitive Sciences, Peking UniversityBeijing, China; The Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital)Guangzhou, China; Beijing Institute for Brain Disorder, Capital Medical UniversityBeijing, China
| |
Collapse
|
24
|
Helfer KS, Freyman RL. Age equivalence in the benefit of repetition for speech understanding. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:EL371. [PMID: 27908048 PMCID: PMC5392078 DOI: 10.1121/1.4966586] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 10/06/2016] [Accepted: 10/17/2016] [Indexed: 05/29/2023]
Abstract
Although repetition is the most commonly used conversational repair strategy, little is known about its relative effectiveness among listeners spanning the adult age range. The purpose of this study was to identify differences in how younger, middle-aged, and older adults were able to use immediate repetition to improve speech recognition in the presence of different kinds of maskers. Results suggest that all groups received approximately the same amount of benefit from repetition. Repetition benefit was largest when the masker was fluctuating noise and smallest when it was competing speech.
Collapse
Affiliation(s)
- Karen S Helfer
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA ,
| | - Richard L Freyman
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA ,
| |
Collapse
|
25
|
The cocktail-party problem revisited: early processing and selection of multi-talker speech. Atten Percept Psychophys 2015; 77:1465-87. [PMID: 25828463 PMCID: PMC4469089 DOI: 10.3758/s13414-015-0882-9] [Citation(s) in RCA: 212] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
How do we recognize what one person is saying when others are speaking at the same time? This review summarizes widespread research in psychoacoustics, auditory scene analysis, and attention, all dealing with early processing and selection of speech, which has been stimulated by this question. Important effects occurring at the peripheral and brainstem levels are mutual masking of sounds and “unmasking” resulting from binaural listening. Psychoacoustic models have been developed that can predict these effects accurately, albeit using computational approaches rather than approximations of neural processing. Grouping—the segregation and streaming of sounds—represents a subsequent processing stage that interacts closely with attention. Sounds can be easily grouped—and subsequently selected—using primitive features such as spatial location and fundamental frequency. More complex processing is required when lexical, syntactic, or semantic information is used. Whereas it is now clear that such processing can take place preattentively, there also is evidence that the processing depth depends on the task-relevancy of the sound. This is consistent with the presence of a feedback loop in attentional control, triggering enhancement of to-be-selected input. Despite recent progress, there are still many unresolved issues: there is a need for integrative models that are neurophysiologically plausible, for research into grouping based on other than spatial or voice-related cues, for studies explicitly addressing endogenous and exogenous attention, for an explanation of the remarkable sluggishness of attention focused on dynamically changing sounds, and for research elucidating the distinction between binaural speech perception and sound localization.
Collapse
|
26
|
Helfer KS, Jesse A. Lexical influences on competing speech perception in younger, middle-aged, and older adults. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:363-76. [PMID: 26233036 PMCID: PMC4506307 DOI: 10.1121/1.4923155] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Revised: 04/09/2015] [Accepted: 06/16/2015] [Indexed: 05/20/2023]
Abstract
The influence of lexical characteristics of words in to-be-attended and to-be-ignored speech streams was examined in a competing speech task. Older, middle-aged, and younger adults heard pairs of low-cloze probability sentences in which the frequency or neighborhood density of words was manipulated in either the target speech stream or the masking speech stream. All participants also completed a battery of cognitive measures. As expected, for all groups, target words that occur frequently or that are from sparse lexical neighborhoods were easier to recognize than words that are infrequent or from dense neighborhoods. Compared to other groups, these neighborhood density effects were largest for older adults; the frequency effect was largest for middle-aged adults. Lexical characteristics of words in the to-be-ignored speech stream also affected recognition of to-be-attended words, but only when overall performance was relatively good (that is, when younger participants listened to the speech streams at a more advantageous signal-to-noise ratio). For these listeners, to-be-ignored masker words from sparse neighborhoods interfered with recognition of target speech more than masker words from dense neighborhoods. Amount of hearing loss and cognitive abilities relating to attentional control modulated overall performance as well as the strength of lexical influences.
Collapse
Affiliation(s)
- Karen S Helfer
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA
| | - Alexandra Jesse
- Department of Psychological and Brain Sciences, University of Massachusetts Amherst, 135 Hicks Way, Amherst, Massachusetts 01003, USA
| |
Collapse
|
27
|
Joseph S, Kumar S, Husain M, Griffiths TD. Auditory working memory for objects vs. features. Front Neurosci 2015; 9:13. [PMID: 25709563 PMCID: PMC4321563 DOI: 10.3389/fnins.2015.00013] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Accepted: 01/12/2015] [Indexed: 11/13/2022] Open
Abstract
This work considers bases for working memory for non-verbal sounds. Specifically we address whether sounds are represented as integrated objects or individual features in auditory working memory and whether the representational format influences WM capacity. The experiments used sounds in which two different stimulus features, spectral passband and temporal amplitude modulation rate, could be combined to produce different auditory objects. Participants had to memorize sequences of auditory objects of variable length (1-4 items). They either maintained sequences of whole objects or sequences of individual features until recall for one of the items was tested. Memory recall was more accurate when the objects had to be maintained as a whole compared to the individual features alone. This is due to interference between features of the same object. Additionally a feature extraction cost was associated with maintenance and recall of individual features, when extracted from bound object representations. An interpretation of our findings is that, at some stage of processing, sounds might be stored as objects in WM with features bound into coherent wholes. The results have implications for feature-integration theory in the context of WM in the auditory system.
Collapse
Affiliation(s)
- Sabine Joseph
- Institute of Cognitive Neuroscience, University College London London, UK ; Institute of Neurology, University College London London, UK
| | - Sukhbinder Kumar
- Wellcome Trust Centre for Neuroimaging, University College London London, UK ; Institute of Neuroscience, Medical School, Newcastle University Newcastle, UK
| | - Masud Husain
- Nuffield Department of Clinical Neurosciences, University of Oxford Oxford, UK ; Department of Experimental Psychology, University of Oxford Oxford, UK
| | - Timothy D Griffiths
- Wellcome Trust Centre for Neuroimaging, University College London London, UK ; Institute of Neuroscience, Medical School, Newcastle University Newcastle, UK
| |
Collapse
|
28
|
Helfer KS, Freyman RL. Stimulus and listener factors affecting age-related changes in competing speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:748-759. [PMID: 25096109 PMCID: PMC4187459 DOI: 10.1121/1.4887463] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2013] [Revised: 06/19/2014] [Accepted: 06/24/2014] [Indexed: 05/29/2023]
Abstract
The purpose of this study was to examine associations among hearing thresholds, cognitive ability, and speech understanding in adverse listening conditions within and between groups of younger, middle-aged, and older adults. Participants repeated back sentences played in the presence of several types of maskers (syntactically similar and syntactically different competing speech from one or two other talkers, and steady-state speech-shaped noise). They also completed tests of auditory short-term/working memory, processing speed, and inhibitory ability. Results showed that group differences in accuracy of word identification and in error patterns differed depending upon the number of masking voices; specifically, older and middle-aged individuals had particular difficulty, relative to younger subjects, in the presence of a single competing message. However, the effect of syntactic similarity was consistent across subject groups. Hearing loss, short-term memory, processing speed, and inhibitory ability were each related to some aspects of performance by the middle-aged and older participants. Notably, substantial age-related changes in speech recognition were apparent within the group of middle-aged listeners.
Collapse
Affiliation(s)
- Karen S Helfer
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003
| | - Richard L Freyman
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003
| |
Collapse
|
29
|
Clarke J, Gaudrain E, Chatterjee M, Başkent D. T'ain't the way you say it, it's what you say--perceptual continuity of voice and top-down restoration of speech. Hear Res 2014; 315:80-7. [PMID: 25019356 DOI: 10.1016/j.heares.2014.07.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Revised: 06/25/2014] [Accepted: 07/02/2014] [Indexed: 11/19/2022]
Abstract
Phonemic restoration, or top-down repair of speech, is the ability of the brain to perceptually reconstruct missing speech sounds, using remaining speech features, linguistic knowledge and context. This usually occurs in conditions where the interrupted speech is perceived as continuous. The main goal of this study was to investigate whether voice continuity was necessary for phonemic restoration. Restoration benefit was measured by the improvement in intelligibility of meaningful sentences interrupted with periodic silent gaps, after the gaps were filled with noise bursts. A discontinuity was induced on the voice characteristics. The fundamental frequency, the vocal tract length, or both of the original vocal characteristics were changed using STRAIGHT to make a talker sound like a different talker from one speech segment to another. Voice discontinuity reduced the global intelligibility of interrupted sentences, confirming the importance of vocal cues for perceptually constructing a speech stream. However, phonemic restoration benefit persisted through all conditions despite the weaker voice continuity. This finding suggests that participants may have relied more on other cues, such as pitch contours or perhaps even linguistic context, when the vocal continuity was disrupted.
Collapse
Affiliation(s)
- Jeanne Clarke
- University of Groningen, University Medical Center Groningen, Department of Otorhinolaryngology/Head and Neck Surgery, Groningen, The Netherlands; University of Groningen, Graduate School of Medical Sciences, Research School of Behavioral and Cognitive Neurosciences, Groningen, The Netherlands.
| | - Etienne Gaudrain
- University of Groningen, University Medical Center Groningen, Department of Otorhinolaryngology/Head and Neck Surgery, Groningen, The Netherlands; University of Groningen, Graduate School of Medical Sciences, Research School of Behavioral and Cognitive Neurosciences, Groningen, The Netherlands.
| | | | - Deniz Başkent
- University of Groningen, University Medical Center Groningen, Department of Otorhinolaryngology/Head and Neck Surgery, Groningen, The Netherlands; University of Groningen, Graduate School of Medical Sciences, Research School of Behavioral and Cognitive Neurosciences, Groningen, The Netherlands.
| |
Collapse
|
30
|
Tamati TN, Gilbert JL, Pisoni DB. Some factors underlying individual differences in speech recognition on PRESTO: a first report. J Am Acad Audiol 2014; 24:616-34. [PMID: 24047949 DOI: 10.3766/jaaa.24.7.10] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
BACKGROUND Previous studies investigating speech recognition in adverse listening conditions have found extensive variability among individual listeners. However, little is currently known about the core underlying factors that influence speech recognition abilities. PURPOSE To investigate sensory, perceptual, and neurocognitive differences between good and poor listeners on the Perceptually Robust English Sentence Test Open-set (PRESTO), a new high-variability sentence recognition test under adverse listening conditions. RESEARCH DESIGN Participants who fell in the upper quartile (HiPRESTO listeners) or lower quartile (LoPRESTO listeners) on key word recognition on sentences from PRESTO in multitalker babble completed a battery of behavioral tasks and self-report questionnaires designed to investigate real-world hearing difficulties, indexical processing skills, and neurocognitive abilities. STUDY SAMPLE Young, normal-hearing adults (N = 40) from the Indiana University community participated in the current study. DATA COLLECTION AND ANALYSIS Participants' assessment of their own real-world hearing difficulties was measured with a self-report questionnaire on situational hearing and hearing health history. Indexical processing skills were assessed using a talker discrimination task, a gender discrimination task, and a forced-choice regional dialect categorization task. Neurocognitive abilities were measured with the Auditory Digit Span Forward (verbal short-term memory) and Digit Span Backward (verbal working memory) tests, the Stroop Color and Word Test (attention/inhibition), the WordFam word familiarity test (vocabulary size), the Behavioral Rating Inventory of Executive Function-Adult Version (BRIEF-A) self-report questionnaire on executive function, and two performance subtests of the Wechsler Abbreviated Scale of Intelligence (WASI) Performance Intelligence Quotient (IQ; nonverbal intelligence). Scores on self-report questionnaires and behavioral tasks were tallied and analyzed by listener group (HiPRESTO and LoPRESTO). RESULTS The extreme groups did not differ overall on self-reported hearing difficulties in real-world listening environments. However, an item-by-item analysis of questions revealed that LoPRESTO listeners reported significantly greater difficulty understanding speakers in a public place. HiPRESTO listeners were significantly more accurate than LoPRESTO listeners at gender discrimination and regional dialect categorization, but they did not differ on talker discrimination accuracy or response time, or gender discrimination response time. HiPRESTO listeners also had longer forward and backward digit spans, higher word familiarity ratings on the WordFam test, and lower (better) scores for three individual items on the BRIEF-A questionnaire related to cognitive load. The two groups did not differ on the Stroop Color and Word Test or either of the WASI performance IQ subtests. CONCLUSIONS HiPRESTO listeners and LoPRESTO listeners differed in indexical processing abilities, short-term and working memory capacity, vocabulary size, and some domains of executive functioning. These findings suggest that individual differences in the ability to encode and maintain highly detailed episodic information in speech may underlie the variability observed in speech recognition performance in adverse listening conditions using high-variability PRESTO sentences in multitalker babble.
Collapse
|
31
|
MacPherson A, Akeroyd MA. Variations in the slope of the psychometric functions for speech intelligibility: a systematic survey. Trends Hear 2014; 18:18/0/2331216514537722. [PMID: 24906905 PMCID: PMC4227668 DOI: 10.1177/2331216514537722] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Although many studies have looked at the effects of different listening conditions on the intelligibility of speech, their analyses have often concentrated on changes to a single value on the psychometric function, namely, the threshold. Far less commonly has the slope of the psychometric function, that is, the rate at which intelligibility changes with level, been considered. The slope of the function is crucial because it is the slope, rather than the threshold, that determines the improvement in intelligibility caused by any given improvement in signal-to-noise ratio by, for instance, a hearing aid. The aim of the current study was to systematically survey and reanalyze the psychometric function data available in the literature in an attempt to quantify the range of slope changes across studies and to identify listening conditions that affect the slope of the psychometric function. The data for 885 individual psychometric functions, taken from 139 different studies, were fitted with a common logistic equation from which the slope was calculated. Large variations in slope across studies were found, with slope values ranging from as shallow as 1% per dB to as steep as 44% per dB (median = 6.6% per dB), suggesting that the perceptual benefit offered by an improvement in signal-to-noise ratio depends greatly on listening environment. The type and number of maskers used were found to be major factors on the value of the slope of the psychometric function while other minor effects of target predictability, target corpus, and target/masker similarity were also found.
Collapse
Affiliation(s)
- Alexandra MacPherson
- MRC/CSO Institute of Hearing Research-Scottish Section, Glasgow Royal Infirmary, Glasgow, UK School of Psychological Sciences & Health, University of Strathclyde, Glasgow, UK
| | - Michael A Akeroyd
- MRC/CSO Institute of Hearing Research-Scottish Section, Glasgow Royal Infirmary, Glasgow, UK
| |
Collapse
|
32
|
Nielsen JB, Dau T, Neher T. A Danish open-set speech corpus for competing-speech studies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:407-420. [PMID: 24437781 DOI: 10.1121/1.4835935] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed in a setup with a frontal target sentence and two concurrent masker sentences at ±50° azimuth. For a group of 16 normal-hearing listeners and a group of 15 elderly (linearly aided) hearing-impaired listeners, overall SRTs of, respectively, +1.3 dB and +6.3 dB target-to-masker ratio were obtained. The new corpus was found to be very sensitive to inter-individual differences and produced consistent results across test lists. The corpus is publicly available.
Collapse
Affiliation(s)
- Jens Bo Nielsen
- Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Torsten Dau
- Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Tobias Neher
- Eriksholm Research Centre, Oticon A/S, Ro̸rtangvej 20, DK-3070 Snekkersten, Denmark
| |
Collapse
|
33
|
Freyman RL, Griffin AM, Macmillan NA. Priming of lowpass-filtered speech affects response bias, not sensitivity, in a bandwidth discrimination task. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:1183-92. [PMID: 23927117 PMCID: PMC3745481 DOI: 10.1121/1.4807824] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Revised: 05/01/2013] [Accepted: 05/09/2013] [Indexed: 05/26/2023]
Abstract
Priming is demonstrated when prior information about the content of a distorted, filtered, or masked auditory message improves its clarity. The current experiment attempted to quantify aspects of priming by determining its effects on performance and bias in a lowpass-filter-cutoff frequency discrimination task. Nonsense sentences recorded by a female talker were sharply lowpass filtered at a nominal cutoff frequency (F) of 0.5 or 0.75 kHz or at a higher cutoff frequency (F + ΔF). The listeners' task was to determine which interval of a two-interval-forced-choice trial contained the nonsense sentence filtered with F + ΔF. On priming trials, the interval 1 sentence was displayed on a computer screen prior to the auditory portion of the trial. The prime markedly affected bias, increasing the number of correct and incorrect interval 1 responses but did not affect overall discrimination performance substantially. These findings were supported through a second experiment that required listeners to make confidence judgments. The paradigm has the potential to help quantify the limits of speech perception when uncertainty about the auditory message is removed.
Collapse
Affiliation(s)
- Richard L Freyman
- Department of Communication Disorders, University of Massachusetts, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA.
| | | | | |
Collapse
|
34
|
Abstract
OBJECTIVES The purpose of this study was to test the hypothesis that a carrier phrase can improve word recognition performance for both children and adults by providing an auditory grouping cue. It was hypothesized that the carrier phrase would benefit listeners under conditions in which they have difficulty in perceptually separating the target word from the competing background. To test this hypothesis, word recognition was examined for maskers that were believed to vary in their ability to create perceptual masking. In addition to determining the conditions under which a carrier-phrase benefit is obtained, age-related differences in both susceptibility to masking and carrier-phrase benefit were examined. DESIGN Two experiments were conducted to characterize developmental effects in the ability to benefit from a carrier phrase (i.e., "say the word") before the target word. Using an open-set task, word recognition performance was measured for three listener age groups: 5- to 7-year-old children, 8- to 10-year-old children, and adults (18-30 years). For all experiments, target words were presented in each of two carrier-phrase conditions: (1) carrier-present and (2) carrier-absent. Across experiments, word recognition performance was assessed in the presence of multi-talker babble (Experiment 1), two-talker speech (Experiment 2), or speech-shaped noise (Experiment 2). RESULTS Children's word recognition performance was generally poorer than that of adults for all three masker conditions. Differences between the two age groups of children were seen for both speech-shaped noise and multi-talker babble, with 5- to 7-year-olds performing more poorly than 8- to 10-year-olds. However, 5- to 7-year-olds and 8- to 10-year-olds performed similarly for the two-talker masker. Despite developmental effects in susceptibility to masking, both groups of children and adults showed a carrier-phrase benefit in multi-talker babble (Experiment 1) and in the two-talker masker (Experiment 2). The magnitude of the carrier-phrase benefit was similar for a given masker type across age groups, but the carrier-phrase benefit was greater in the presence of the two-talker masker than in multi-talker babble. Specifically, the children's average carrier-phrase benefit was 7.1% for multi-talker and 16.8% for the two-talker masker condition. No carrier-phrase benefit was observed for any age group in the presence of speech-shaped noise. CONCLUSIONS Effects of auditory masking on word recognition performance were greater for children than for adults. The time course of development for susceptibility to masking seems to be more prolonged for a two-talker speech masker than for multi-talker babble or speech-shaped noise. Unique to the present study, this work suggests that a carrier phrase can provide an effective auditory grouping cue for both children and adults under conditions expected to produce substantial perceptual masking.
Collapse
Affiliation(s)
- Angela Yarnell Bonino
- Department of Allied Health Sciences, CB 7190, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| | | | | |
Collapse
|
35
|
Gilbert JL, Tamati TN, Pisoni DB. Development, reliability, and validity of PRESTO: a new high-variability sentence recognition test. J Am Acad Audiol 2013; 24:26-36. [PMID: 23231814 PMCID: PMC3683852 DOI: 10.3766/jaaa.24.1.4] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
BACKGROUND There is a pressing need for new clinically feasible speech recognition tests that are theoretically motivated, sensitive to individual differences, and access the core perceptual and neurocognitive processes used in speech perception. PRESTO (Perceptually Robust English Sentence Test Open-set) is a new high-variability sentence test designed to reflect current theories of exemplar-based learning, attention, and perception, including lexical organization and automatic encoding of indexical attributes. Using sentences selected from the TIMIT (Texas Instruments/Massachusetts Institute of Technology) speech corpus, PRESTO was developed to include talker and dialect variability. The test consists of lists balanced for talker gender, keywords, frequency, and familiarity. PURPOSE To investigate the performance, reliability, and validity of PRESTO. RESEARCH DESIGN In Phase I, PRESTO sentences were presented in multitalker babble at four signal-to-noise ratios (SNRs) to obtain a distribution of performance. In Phase II, participants returned and were tested on new PRESTO sentences and on HINT (Hearing In Noise Test) sentences presented in multitalker babble. STUDY SAMPLE Young, normal-hearing adults (N = 121) were recruited from the Indiana University community for Phase I. Participants who scored within the upper and lower quartiles of performance in Phase I were asked to return for Phase II (N = 40). DATA COLLECTION AND ANALYSIS In both Phase I and Phase II, participants listened to sentences presented diotically through headphones while seated in enclosed carrels at the Speech Research Laboratory at Indiana University. They were instructed to type in the sentence that they heard using keyboards interfaced to a computer. Scoring for keywords was completed offline following data collection. Phase I data were analyzed by determining the distribution of performance on PRESTO at each SNR and at the average performance across all SNRs. PRESTO reliability was analyzed by a correlational analysis of participant performance at test (Phase I) and retest (Phase II). PRESTO validity was analyzed by a correlational analysis of participant performance on PRESTO and HINT sentences tested in Phase II, and by an analysis of variance of within-subject factors of sentence test and SNR, and a between-subjects factor of group, based on level of Phase I performance. RESULTS A wide range of performance on PRESTO was observed; averaged across all SNRs, keyword accuracy ranged from 40.26 to 76.18% correct. PRESTO accuracy at retest (Phase II) was highly correlated with Phase I accuracy (r = 0.92, p < 0.001). PRESTO scores were also correlated with scores on HINT sentences (r = 0.52, p < 0.001). Phase II results showed an interaction between sentence test type and SNR [F(3, 114) = 121.36, p < 0.001], with better performance on HINT sentences at more favorable SNRs and better performance on PRESTO sentences at poorer SNRs. CONCLUSIONS PRESTO demonstrated excellent test/retest reliability. Although a moderate correlation was observed between PRESTO and HINT sentences, a different pattern of results occurred with the two types of sentences depending on the level of the competition, suggesting the use of different processing strategies. Findings from this study demonstrate the importance of high-variability materials for assessing and understanding individual differences in speech perception.
Collapse
Affiliation(s)
- Jaimie L Gilbert
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA.
| | | | | |
Collapse
|
36
|
Bernstein JGW, Summers V, Iyer N, Brungart DS. Set-size procedures for controlling variations in speech-reception performance with a fluctuating masker. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:2676-89. [PMID: 23039460 PMCID: PMC3477195 DOI: 10.1121/1.4746019] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Revised: 07/18/2012] [Accepted: 07/23/2012] [Indexed: 05/25/2023]
Abstract
Adaptive signal-to-noise ratio (SNR) tracking is often used to measure speech reception in noise. Because SNR varies with performance using this method, data interpretation can be confounded when measuring an SNR-dependent effect such as the fluctuating-masker benefit (FMB) (the intelligibility improvement afforded by brief dips in the masker level). One way to overcome this confound, and allow FMB comparisons across listener groups with different stationary-noise performance, is to adjust the response set size to equalize performance across groups at a fixed SNR. However, this technique is only valid under the assumption that changes in set size have the same effect on percentage-correct performance for different masker types. This assumption was tested by measuring nonsense-syllable identification for normal-hearing listeners as a function of SNR, set size and masker (stationary noise, 4- and 32-Hz modulated noise and an interfering talker). Set-size adjustment had the same impact on performance scores for all maskers, confirming the independence of FMB (at matched SNRs) and set size. These results, along with those of a second experiment evaluating an adaptive set-size algorithm to adjust performance levels, establish set size as an efficient and effective tool to adjust baseline performance when comparing effects of masker fluctuations between listener groups.
Collapse
Affiliation(s)
- Joshua G W Bernstein
- Audiology and Speech Center, Walter Reed National Military Medical Center, Bethesda, Maryland 20889, USA.
| | | | | | | |
Collapse
|
37
|
Bernstein JGW, Summers V, Iyer N, Brungart DS. Set-size procedures for controlling variations in speech-reception performance with a fluctuating masker. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:2676-2689. [PMID: 23039460 DOI: 10.5041466/1.4746019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Adaptive signal-to-noise ratio (SNR) tracking is often used to measure speech reception in noise. Because SNR varies with performance using this method, data interpretation can be confounded when measuring an SNR-dependent effect such as the fluctuating-masker benefit (FMB) (the intelligibility improvement afforded by brief dips in the masker level). One way to overcome this confound, and allow FMB comparisons across listener groups with different stationary-noise performance, is to adjust the response set size to equalize performance across groups at a fixed SNR. However, this technique is only valid under the assumption that changes in set size have the same effect on percentage-correct performance for different masker types. This assumption was tested by measuring nonsense-syllable identification for normal-hearing listeners as a function of SNR, set size and masker (stationary noise, 4- and 32-Hz modulated noise and an interfering talker). Set-size adjustment had the same impact on performance scores for all maskers, confirming the independence of FMB (at matched SNRs) and set size. These results, along with those of a second experiment evaluating an adaptive set-size algorithm to adjust performance levels, establish set size as an efficient and effective tool to adjust baseline performance when comparing effects of masker fluctuations between listener groups.
Collapse
Affiliation(s)
- Joshua G W Bernstein
- Audiology and Speech Center, Walter Reed National Military Medical Center, Bethesda, Maryland 20889, USA.
| | | | | | | |
Collapse
|
38
|
|
39
|
Wu M, Li H, Gao Y, Lei M, Teng X, Wu X, Li L. Adding irrelevant information to the content prime reduces the prime-induced unmasking effect on speech recognition. Hear Res 2012; 283:136-43. [DOI: 10.1016/j.heares.2011.11.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Revised: 10/30/2011] [Accepted: 11/01/2011] [Indexed: 11/17/2022]
|
40
|
Masking of speech in people with first-episode schizophrenia and people with chronic schizophrenia. Schizophr Res 2012; 134:33-41. [PMID: 22019075 DOI: 10.1016/j.schres.2011.09.019] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2011] [Revised: 09/17/2011] [Accepted: 09/18/2011] [Indexed: 11/20/2022]
Abstract
In "cocktail-party" environments, although listeners feel it difficult to recognize attended speech due to both energetic masking and informational masking, they can use various perceptual/cognitive cues, such as content and voice primes, to facilitate their attention to target speech. In patients with schizophrenia, both speech-perception deficits and increased vulnerability to masking stimuli generally occur. This study investigated whether speech recognition in first-episode patients (FEPs) and chronic patients (CPs) of schizophrenia is more vulnerable to noise masking and/or speech masking than that in demographics-matched-healthy controls, and whether patients with schizophrenia can use primes to unmask speech. In a trial under the priming condition, before the target sentence containing three keywords was co-presented with a noise or speech masker, the prime (early part of the sentence including the first two keywords) was recited in quiet with the target-speaker's voice. The results show that in patients, target-speech recognition was more impaired under speech-masking conditions than noise-masking conditions, and the impairment in CPs (n=22) was larger than that in FEPs (n=12). Although working memory for holding prime-content information in patients, especially CPs, was more vulnerable to masking, especially speech masking, than that in healthy controls, patients were still able to use the prime to unmask the last keyword. Thus, in "cocktail-party" environments, speech recognition in people with schizophrenia is more vulnerable to masking, particularly informational masking, and the speech-recognition impairment augments as the illness progresses. However, people with schizophrenia can use the content/voice prime to reduce energetic masking and informational masking of target speech.
Collapse
|
41
|
Maddox RK, Shinn-Cunningham BG. Influence of task-relevant and task-irrelevant feature continuity on selective auditory attention. J Assoc Res Otolaryngol 2011; 13:119-29. [PMID: 22124889 DOI: 10.1007/s10162-011-0299-7] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2011] [Accepted: 10/19/2011] [Indexed: 11/30/2022] Open
Abstract
Past studies have explored the relative strengths of auditory features in a selective attention task by pitting features against one another and asking listeners to report the words perceived in a given sentence. While these studies show that the continuity of competing features affects streaming, they did not address whether the influence of specific features is modulated by volitionally directed attention. Here, we explored whether the continuity of a task-irrelevant feature affects the ability to selectively report one of two competing speech streams when attention is specifically directed to a different feature. Sequences of simultaneous pairs of spoken digits were presented in which exactly one digit of each pair matched a primer phrase in pitch and exactly one digit of each pair matched the primer location. Within a trial, location and pitch were randomly paired; they either were consistent with each other from digit to digit or were switched (e.g., the sequence from the primer's location changed pitch across digits). In otherwise identical blocks, listeners were instructed to report digits matching the primer either in location or in pitch. Listeners were told to ignore the irrelevant feature, if possible, in order to perform well. Listener responses depended on task instructions, proving that top-down attention alters how a subject performs the task. Performance improved when the separation of the target and masker in the task-relevant feature increased. Importantly, the values of the task-irrelevant feature also influenced performance in some cases. Specifically, when instructed to attend location, listeners performed worse as the separation between target and masker pitch increased, especially when the spatial separation between digits was small. These results indicate that task-relevant and task-irrelevant features are perceptually bound together: continuity of task-irrelevant features influences selective attention in an automatic, obligatory manner, consistent with the idea that auditory attention operates on objects.
Collapse
Affiliation(s)
- Ross K Maddox
- Hearing Research Center, Biomedical Engineering, Boston, MA 02215, USA.
| | | |
Collapse
|
42
|
Du Y, Kong L, Wang Q, Wu X, Li L. Auditory frequency-following response: a neurophysiological measure for studying the "cocktail-party problem". Neurosci Biobehav Rev 2011; 35:2046-57. [PMID: 21645541 DOI: 10.1016/j.neubiorev.2011.05.008] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 05/12/2011] [Accepted: 05/19/2011] [Indexed: 11/19/2022]
Abstract
How do we recognize what one person is saying when others are speaking at the same time? The "cocktail-party problem" proposed by Cherry (1953) has puzzled scientific societies for half a century. This puzzle will not be solved without using appropriate neurophysiological investigation that should satisfy the following four essential requirements: (1) certain critical speech characteristics related to speech intelligibility are recorded; (2) neural responses to different speech sources are differentiated; (3) neural correlates of bottom-up binaural unmasking of responses to target speech are measurable; (4) neural correlates of attentional top-down unmasking of target speech are measurable. Before speech signals reach the cerebral cortex, some critical acoustic features are represented in subcortical structures by the frequency-following responses (FFRs), which are sustained evoked potentials based on precisely phase-locked responses of neuron populations to low-to-middle-frequency periodical acoustical stimuli. This review summarizes previous studies on FFRs associated with each of the four requirements and suggests that FFRs are useful for studying the "cocktail-party problem".
Collapse
Affiliation(s)
- Yi Du
- Department of Psychology, Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China
| | | | | | | | | |
Collapse
|
43
|
Helfer KS, Chevalier J, Freyman RL. Aging, spatial cues, and single- versus dual-task performance in competing speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:3625-3633. [PMID: 21218894 PMCID: PMC3037770 DOI: 10.1121/1.3502462] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Revised: 09/22/2010] [Accepted: 09/22/2010] [Indexed: 05/26/2023]
Abstract
Older individuals often report difficulty coping in situations with multiple conversations in which they at times need to "tune out" the background speech and at other times seek to monitor competing messages. The present study was designed to simulate this type of interaction by examining the cost of requiring listeners to perform a secondary task in conjunction with understanding a target talker in the presence of competing speech. The ability of younger and older adults to understand a target utterance was measured with and without requiring the listener to also determine how many masking voices were presented time-reversed. Also of interest was how spatial separation affected the ability to perform these two tasks. Older adults demonstrated slightly reduced overall speech recognition and obtained less spatial release from masking, as compared to younger listeners. For both younger and older listeners, spatial separation increased the costs associated with performing both tasks together. The meaningfulness of the masker had a greater detrimental effect on speech understanding for older participants than for younger participants. However, the results suggest that the problems experienced by older adults in complex listening situations are not necessarily due to a deficit in the ability to switch and/or divide attention among talkers.
Collapse
Affiliation(s)
- Karen S Helfer
- Department of Communication Disorders, University of Massachusetts, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA.
| | | | | |
Collapse
|
44
|
Calandruccio L, Dhar S, Bradlow AR. Speech-on-speech masking with variable access to the linguistic content of the masker speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:860-9. [PMID: 20707455 PMCID: PMC2933260 DOI: 10.1121/1.3458857] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2009] [Revised: 03/11/2010] [Accepted: 06/09/2010] [Indexed: 05/21/2023]
Abstract
It has been reported that listeners can benefit from a release in masking when the masker speech is spoken in a language that differs from the target speech compared to when the target and masker speech are spoken in the same language [Freyman, R. L. et al. (1999). J. Acoust. Soc. Am. 106, 3578-3588; Van Engen, K., and Bradlow, A. (2007), J. Acoust. Soc. Am. 121, 519-526]. It is unclear whether listeners benefit from this release in masking due to the lack of linguistic interference of the masker speech, from acoustic and phonetic differences between the target and masker languages, or a combination of these differences. In the following series of experiments, listeners' sentence recognition was evaluated using speech and noise maskers that varied in the amount of linguistic content, including native-English, Mandarin-accented English, and Mandarin speech. Results from three experiments indicated that the majority of differences observed between the linguistic maskers could be explained by spectral differences between the masker conditions. However, when the recognition task increased in difficulty, i.e., at a more challenging signal-to-noise ratio, a greater decrease in performance was observed for the maskers with more linguistically relevant information than what could be explained by spectral differences alone.
Collapse
Affiliation(s)
- Lauren Calandruccio
- Department of Linguistics and Communication Disorders, Queens College of the City University of New York, Flushing, New York 11367, USA.
| | | | | |
Collapse
|