1
|
Huo M, Sun Y, Fogerty D, Tang Y. Release from same-talker speech-in-speech masking: Effects of masker intelligibility and other contributing factorsa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:2960-2973. [PMID: 39485097 DOI: 10.1121/10.0034235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 10/11/2024] [Indexed: 11/03/2024]
Abstract
Human speech perception declines in the presence of masking speech, particularly when the masker is intelligible and acoustically similar to the target. A prior investigation demonstrated a substantial reduction in masking when the intelligibility of competing speech was reduced by corrupting voiced segments with noise [Huo, Sun, Fogerty, and Tang (2023), "Quantifying informational masking due to masker intelligibility in same-talker speech-in-speech perception," in Interspeech 2023, pp. 1783-1787]. As this processing also reduced the prominence of voiced segments, it was unclear whether the unmasking was due to reduced linguistic content, acoustic similarity, or both. The current study compared the masking of original competing speech (high intelligibility) to competing speech with time reversal of voiced segments (VS-reversed, low intelligibility) at various target-to-masker ratios. Modeling results demonstrated similar energetic masking between the two maskers. However, intelligibility of the target speech was considerably better with the VS-reversed masker compared to the original masker, likely due to the reduced linguistic content. Further corrupting the masker's voiced segments resulted in additional release from masking. Acoustic analyses showed that the portion of target voiced segments overlapping with masker voiced segments and the similarity between target and masker overlapped voiced segments impacted listeners' speech recognition. Evidence also suggested modulation masking in the spectro-temporal domain interferes with listeners' ability to glimpse the target.
Collapse
Affiliation(s)
- Mingyue Huo
- Department of Linguistics, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Yinglun Sun
- Department of Linguistics, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Daniel Fogerty
- Department of Speech & Hearing Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Yan Tang
- Department of Linguistics, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
2
|
Ruiz Callejo D, Wouters J, Boets B. Speech-in-noise perception in autistic adolescents with and without early language delay. Autism Res 2023; 16:1719-1727. [PMID: 37318057 DOI: 10.1002/aur.2966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 05/30/2023] [Indexed: 06/16/2023]
Abstract
Speech-in-noise perception seems aberrant in individuals with autism spectrum disorder (ASD). Potential aggravating factors are the level of linguistic skills and impairments in auditory temporal processing. Here, we investigated autistic adolescents with and without language delay as compared to non-autistic peers, and we assessed speech perception in steady-state noise, temporally modulated noise, and concurrent speech. We found that autistic adolescents with intact language capabilities and not those with language delay performed worse than NT peers on words-in-stationary-noise perception. For the perception of sentences in stationary noise, we did not observe significant group differences, although autistic adolescents with language delay tend to perform worse in comparison to their TD peers. We also found evidence for a robust deficit in speech-in-concurrent-speech processing in ASD independent of language ability, as well as an association between early language delay in ASD and inadequate temporal speech processing. We propose that reduced voice stream segregation and inadequate social attentional orienting in ASD result in disproportional informational masking of the speech signal. These findings indicate a speech-in-speech processing deficit in autistic adolescents with broad implications for the quality of social communication.
Collapse
Affiliation(s)
- Diego Ruiz Callejo
- Center for Developmental Psychiatry, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Jan Wouters
- Research Group ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Bart Boets
- Center for Developmental Psychiatry, Department of Neurosciences, KU Leuven, Leuven, Belgium
- Leuven Autism Research (LAuRes), KU Leuven, Leuven, Belgium
- Leuven Brain Institute (LBI), KU Leuven, Leuven, Belgium
| |
Collapse
|
3
|
Yasmin S, Irsik VC, Johnsrude IS, Herrmann B. The effects of speech masking on neural tracking of acoustic and semantic features of natural speech. Neuropsychologia 2023; 186:108584. [PMID: 37169066 DOI: 10.1016/j.neuropsychologia.2023.108584] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 04/30/2023] [Accepted: 05/08/2023] [Indexed: 05/13/2023]
Abstract
Listening environments contain background sounds that mask speech and lead to communication challenges. Sensitivity to slow acoustic fluctuations in speech can help segregate speech from background noise. Semantic context can also facilitate speech perception in noise, for example, by enabling prediction of upcoming words. However, not much is known about how different degrees of background masking affect the neural processing of acoustic and semantic features during naturalistic speech listening. In the current electroencephalography (EEG) study, participants listened to engaging, spoken stories masked at different levels of multi-talker babble to investigate how neural activity in response to acoustic and semantic features changes with acoustic challenges, and how such effects relate to speech intelligibility. The pattern of neural response amplitudes associated with both acoustic and semantic speech features across masking levels was U-shaped, such that amplitudes were largest for moderate masking levels. This U-shape may be due to increased attentional focus when speech comprehension is challenging, but manageable. The latency of the neural responses increased linearly with increasing background masking, and neural latency change associated with acoustic processing most closely mirrored the changes in speech intelligibility. Finally, tracking responses related to semantic dissimilarity remained robust until severe speech masking (-3 dB SNR). The current study reveals that neural responses to acoustic features are highly sensitive to background masking and decreasing speech intelligibility, whereas neural responses to semantic features are relatively robust, suggesting that individuals track the meaning of the story well even in moderate background sound.
Collapse
Affiliation(s)
- Sonia Yasmin
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada.
| | - Vanessa C Irsik
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada
| | - Ingrid S Johnsrude
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada; School of Communication and Speech Disorders,The University of Western Ontario, London, ON, N6A 5B7, Canada
| | - Björn Herrmann
- Rotman Research Institute, Baycrest, M6A 2E1, Toronto, ON, Canada; Department of Psychology,University of Toronto, M5S 1A1, Toronto, ON, Canada
| |
Collapse
|
4
|
Mepham A, Bi Y, Mattys SL. The time-course of linguistic interference during native and non-native speech-in-speech listening. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:954. [PMID: 36050191 DOI: 10.1121/10.0013417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 07/20/2022] [Indexed: 06/15/2023]
Abstract
Recognizing speech in a noisy background is harder when the background is time-forward than for time-reversed speech, a masker direction effect, and harder when the masker is in a known rather than an unknown language, indicating linguistic interference. We examined the masker direction effect when the masker was a known vs unknown language and calculated performance over 50 trials to assess differential masker adaptation. In experiment 1, native English listeners transcribing English sentences showed a larger masker direction effect with English than Mandarin maskers. In experiment 2, Mandarin non-native speakers of English transcribing Mandarin sentences showed a mirror pattern. Both experiments thus support the target-masker linguistic similarity hypothesis, where interference is maximal when target and masker languages are the same. In experiment 3, Mandarin non-native speakers of English transcribing English sentences showed comparable results for English and Mandarin maskers. Non-native listening is therefore consistent with the known-language interference hypothesis, where interference is maximal when the masker language is known to the listener, whether or not it matches the target language. A trial-by-trial analysis showed that the masker direction effect increased over time during native listening but not during non-native listening. The results indicate different target-to-masker streaming strategies during native and non-native speech-in-speech listening.
Collapse
Affiliation(s)
- Alex Mepham
- Department of Psychology, University of York, Heslington, United Kingdom
| | - Yifei Bi
- College of Foreign Languages, University of Shanghai for Science and Technology, Shanghai, China
| | - Sven L Mattys
- Department of Psychology, University of York, Heslington, United Kingdom
| |
Collapse
|
5
|
Brown VA, Dillman-Hasso NH, Li Z, Ray L, Mamantov E, Van Engen KJ, Strand JF. Revisiting the target-masker linguistic similarity hypothesis. Atten Percept Psychophys 2022; 84:1772-1787. [PMID: 35474415 PMCID: PMC10701341 DOI: 10.3758/s13414-022-02486-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2022] [Indexed: 02/01/2023]
Abstract
The linguistic similarity hypothesis states that it is more difficult to segregate target and masker speech when they are linguistically similar. For example, recognition of English target speech should be more impaired by the presence of Dutch masking speech than Mandarin masking speech because Dutch and English are more linguistically similar than Mandarin and English. Across four experiments, English target speech was consistently recognized more poorly when presented in English masking speech than in silence, speech-shaped noise, or an unintelligible masker (i.e., Dutch or Mandarin). However, we found no evidence for graded masking effects-Dutch did not impair performance more than Mandarin in any experiment, despite 650 participants being tested. This general pattern was consistent when using both a cross-modal paradigm (in which target speech was lipread and maskers were presented aurally; Experiments 1a and 1b) and an auditory-only paradigm (in which both the targets and maskers were presented aurally; Experiments 2a and 2b). These findings suggest that the linguistic similarity hypothesis should be refined to reflect the existing evidence: There is greater release from masking when the masker language differs from the target speech than when it is the same as the target speech. However, evidence that unintelligible maskers impair speech identification to a greater extent when they are more linguistically similar to the target language remains elusive.
Collapse
Affiliation(s)
- Violet A Brown
- Department of Psychological and Brain Sciences, Washington University in St. Louis, One Brookings Drive, St. Louis, MO, 63130, USA.
| | - Naseem H Dillman-Hasso
- Carleton College, Department of Psychology, One North College St, Northfield, MN, 55057, USA
| | - ZhaoBin Li
- Carleton College, Department of Psychology, One North College St, Northfield, MN, 55057, USA
| | - Lucia Ray
- Carleton College, Department of Psychology, One North College St, Northfield, MN, 55057, USA
| | - Ellen Mamantov
- Carleton College, Department of Psychology, One North College St, Northfield, MN, 55057, USA
| | - Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University in St. Louis, One Brookings Drive, St. Louis, MO, 63130, USA
| | - Julia F Strand
- Carleton College, Department of Psychology, One North College St, Northfield, MN, 55057, USA
| |
Collapse
|
6
|
Wilms V, Drijvers L, Brouwer S. The Effects of Iconic Gestures and Babble Language on Word Intelligibility in Sentence Context. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1822-1838. [PMID: 35439423 DOI: 10.1044/2022_jslhr-21-00387] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE This study investigated to what extent iconic co-speech gestures help word intelligibility in sentence context in two different linguistic maskers (native vs. foreign). It was hypothesized that sentence recognition improves with the presence of iconic co-speech gestures and with foreign compared to native babble. METHOD Thirty-two native Dutch participants performed a Dutch word recognition task in context in which they were presented with videos in which an actress uttered short Dutch sentences (e.g., Ze begint te openen, "She starts to open"). Participants were presented with a total of six audiovisual conditions: no background noise (i.e., clear condition) without gesture, no background noise with gesture, French babble without gesture, French babble with gesture, Dutch babble without gesture, and Dutch babble with gesture; and they were asked to type down what was said by the Dutch actress. The accurate identification of the action verbs at the end of the target sentences was measured. RESULTS The results demonstrated that performance on the task was better in the gesture compared to the nongesture conditions (i.e., gesture enhancement effect). In addition, performance was better in French babble than in Dutch babble. CONCLUSIONS Listeners benefit from iconic co-speech gestures during communication and from foreign background speech compared to native. These insights into multimodal communication may be valuable to everyone who engages in multimodal communication and especially to a public who often works in public places where competing speech is present in the background.
Collapse
Affiliation(s)
- Veerle Wilms
- Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| | - Linda Drijvers
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
| | - Susanne Brouwer
- Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| |
Collapse
|
7
|
Roberts B, Summers RJ, Bailey PJ. Effects of stimulus naturalness and contralateral interferers on lexical bias in consonant identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3369. [PMID: 35649936 DOI: 10.1121/10.0011395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 05/02/2022] [Indexed: 06/15/2023]
Abstract
Lexical bias is the tendency to perceive an ambiguous speech sound as a phoneme completing a word; more ambiguity typically causes greater reliance on lexical knowledge. A speech sound ambiguous between /g/ and /k/ is more likely to be perceived as /g/ before /ɪft/ and as /k/ before /ɪs/. The magnitude of this difference-the Ganong shift-increases when high cognitive load limits available processing resources. The effects of stimulus naturalness and informational masking on Ganong shifts and reaction times were explored. Tokens between /gɪ/ and /kɪ/ were generated using morphing software, from which two continua were created ("giss"-"kiss" and "gift"-"kift"). In experiment 1, Ganong shifts were considerably larger for sine- than noise-vocoded versions of these continua, presumably because the spectral sparsity and unnatural timbre of the former increased cognitive load. In experiment 2, noise-vocoded stimuli were presented alone or accompanied by contralateral interferers with constant within-band amplitude envelope, or within-band envelope variation that was the same or different across bands. The latter, with its implied spectro-temporal variation, was predicted to cause the greatest cognitive load. Reaction-time measures matched this prediction; Ganong shifts showed some evidence of greater lexical bias for frequency-varying interferers, but were influenced by context effects and diminished over time.
Collapse
Affiliation(s)
- Brian Roberts
- School of Psychology, Aston University, Birmingham, B4 7ET, United Kingdom
| | - Robert J Summers
- School of Psychology, Aston University, Birmingham, B4 7ET, United Kingdom
| | - Peter J Bailey
- Department of Psychology, University of York, Heslington, York, YO10 5DD, United Kingdom
| |
Collapse
|
8
|
Roberts B, Summers RJ, Bailey PJ. Mandatory dichotic integration of second-formant information: Contralateral sine bleats have predictable effects on consonant place judgments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3693. [PMID: 34852626 DOI: 10.1121/10.0007132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 10/22/2021] [Indexed: 06/13/2023]
Abstract
Speech-on-speech informational masking arises because the interferer disrupts target processing (e.g., capacity limitations) or corrupts it (e.g., intrusions into the target percept); the latter should produce predictable errors. Listeners identified the consonant in monaural buzz-excited three-formant analogues of approximant-vowel syllables, forming a place of articulation series (/w/-/l/-/j/). There were two 11-member series; the vowel was either high-front or low-back. Series members shared formant-amplitude contours, fundamental frequency, and F1+F3 frequency contours; they were distinguished solely by the F2 frequency contour before the steady portion. Targets were always presented in the left ear. For each series, F2 frequency and amplitude contours were also used to generate interferers with altered source properties-sine-wave analogues of F2 (sine bleats) matched to their buzz-excited counterparts. Accompanying each series member with a fixed mismatched sine bleat in the contralateral ear produced systematic and predictable effects on category judgments; these effects were usually largest for bleats involving the fastest rate or greatest extent of frequency change. Judgments of isolated sine bleats using the three place labels were often unsystematic or arbitrary. These results indicate that informational masking by interferers involved corruption of target processing as a result of mandatory dichotic integration of F2 information, despite the grouping cues disfavoring this integration.
Collapse
Affiliation(s)
- Brian Roberts
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| | - Peter J Bailey
- Department of Psychology, University of York, Heslington, York YO10 5DD, United Kingdom
| |
Collapse
|
9
|
Roberts B, Summers RJ. Informational masking of speech depends on masker spectro-temporal variation but not on its coherence. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2416. [PMID: 33138537 DOI: 10.1121/10.0002359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 10/07/2020] [Indexed: 06/11/2023]
Abstract
The impact of an extraneous formant on intelligibility is affected by the extent (depth) of variation in its formant-frequency contour. Two experiments explored whether this impact also depends on masker spectro-temporal coherence, using a method ensuring that interference occurred only through informational masking. Targets were monaural three-formant analogues (F1+F2+F3) of natural sentences presented alone or accompanied by a contralateral competitor for F2 (F2C) that listeners must reject to optimize recognition. The standard F2C was created using the inverted F2 frequency contour and constant amplitude. Variants were derived by dividing F2C into abutting segments (100-200 ms, 10-ms rise/fall). Segments were presented either in the correct order (coherent) or in random order (incoherent), introducing abrupt discontinuities into the F2C frequency contour. F2C depth was also manipulated (0%, 50%, or 100%) prior to segmentation, and the frequency contour of each segment either remained time-varying or was set to constant at the geometric mean frequency of that segment. The extent to which F2C lowered keyword scores depended on segment type (frequency-varying vs constant) and depth, but not segment order. This outcome indicates that the impact on intelligibility depends critically on the overall amount of frequency variation in the competitor, but not its spectro-temporal coherence.
Collapse
Affiliation(s)
- Brian Roberts
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
10
|
Faucette SP, Stuart A. Effect of presentation level and age on release from masking: Behavioral measures. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1510. [PMID: 33003838 DOI: 10.1121/10.0001964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 08/27/2020] [Indexed: 06/11/2023]
Abstract
The effect of presentation level and age on release from masking (RFM) was examined. Two speech-in-noise paradigms [i.e., fixed speech with varying signal-to-noise ratios (SNRs) and fixed noise with varying speech levels] were employed with competing continuous and interrupted noises. Young and older normal-hearing adults participated (N = 36). Word recognition was assessed at three presentation levels (i.e., 20, 30, and 40 dB sensation level) in SNRs of -10, 0, and 10 dB. Reception thresholds for sentences (RTSs) were determined at three presentation levels (i.e., 55, 65, and 75 dB sound pressure level). RTS SNRs were determined in both noises. RFM was computed by subtracting word recognition scores in continuous noise from interrupted noise and RTS SNRs in interrupted noise from continuous noise. Significant effects of presentation level, group, and SNR were seen with word recognition performance. RFM increased with increasing sensation level, was greater in younger adults, and was superior at -10 dB SNR. With RTS SNRs, significant effects of presentation level and group were found. The findings support the notion that RFM is a level dependent auditory temporal resolution phenomenon and older listeners display a deficit relative to younger listeners.
Collapse
Affiliation(s)
- Sarah P Faucette
- Department of Otolaryngology and Communicative Sciences, University of Mississippi Medical Center, 2500 North State Street, Jackson, Mississippi 39216-4505, USA
| | - Andrew Stuart
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, North Carolina 27858-4353, USA
| |
Collapse
|