1
|
Thaler L, Castillo-Serrano JG, Kish D, Norman LJ. Effects of type of emission and masking sound, and their spatial correspondence, on blind and sighted people's ability to echolocate. Neuropsychologia 2024; 196:108822. [PMID: 38342179 DOI: 10.1016/j.neuropsychologia.2024.108822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 01/30/2024] [Accepted: 02/08/2024] [Indexed: 02/13/2024]
Abstract
Ambient sound can mask acoustic signals. The current study addressed how echolocation in people is affected by masking sound, and the role played by type of sound and spatial (i.e. binaural) similarity. We also investigated the role played by blindness and long-term experience with echolocation, by testing echolocation experts, as well as blind and sighted people new to echolocation. Results were obtained in two echolocation tasks where participants listened to binaural recordings of echolocation and masking sounds, and either localized echoes in azimuth or discriminated echo audibility. Echolocation and masking sounds could be either clicks or broad band noise. An adaptive staircase method was used to adjust signal-to-noise ratios (SNRs) based on participants' responses. When target and masker had the same binaural cues (i.e. both were monoaural sounds), people performed better (i.e. had lower SNRs) when target and masker used different types of sound (e.g. clicks in noise-masker or noise in clicks-masker), as compared to when target and masker used the same type of sound (e.g. clicks in click-, or noise in noise-masker). A very different pattern of results was observed when masker and target differed in their binaural cues, in which case people always performed better when clicks were the masker, regardless of type of emission used. Further, direct comparison between conditions with and without binaural difference revealed binaural release from masking only when clicks were used as emissions and masker, but not otherwise (i.e. when noise was used as masker or emission). This suggests that echolocation with clicks or noise may differ in their sensitivity to binaural cues. We observed the same pattern of results for echolocation experts, and blind and sighted people new to echolocation, suggesting a limited role played by long-term experience or blindness. In addition to generating novel predictions for future work, the findings also inform instruction in echolocation for people who are blind or sighted.
Collapse
Affiliation(s)
- L Thaler
- Department of Psychology, Durham University, South Road, Durham, DH1 5AY, UK.
| | | | - D Kish
- World Access for the Blind, 1007 Marino Drive, Placentia, CA, 92870, USA
| | - L J Norman
- Department of Psychology, Durham University, South Road, Durham, DH1 5AY, UK
| |
Collapse
|
2
|
Byrne AJ, Conroy C, Kidd G. Individual differences in speech-on-speech masking are correlated with cognitive and visual task performance. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2137-2153. [PMID: 37800988 PMCID: PMC10631817 DOI: 10.1121/10.0021301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 07/19/2023] [Accepted: 09/17/2023] [Indexed: 10/07/2023]
Abstract
Individual differences in spatial tuning for masked target speech identification were determined using maskers that varied in type and proximity to the target source. The maskers were chosen to produce three strengths of informational masking (IM): high [same-gender, speech-on-speech (SOS) masking], intermediate (the same masker speech time-reversed), and low (speech-shaped, speech-envelope-modulated noise). Typical for this task, individual differences increased as IM increased, while overall performance decreased. To determine the extent to which auditory performance might generalize to another sensory modality, a comparison visual task was also implemented. Visual search time was measured for identifying a cued object among "clouds" of distractors that were varied symmetrically in proximity to the target. The visual maskers also were chosen to produce three strengths of an analog of IM based on feature similarities between the target and maskers. Significant correlations were found for overall auditory and visual task performance, and both of these measures were correlated with an index of general cognitive reasoning. Overall, the findings provide qualified support for the proposition that the ability of an individual to solve IM-dominated tasks depends on cognitive mechanisms that operate in common across sensory modalities.
Collapse
Affiliation(s)
- Andrew J Byrne
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, Boston, Massachusetts 02215, USA
| | - Christopher Conroy
- Department of Biological and Vision Sciences, State University of New York College of Optometry, New York, New York 10036, USA
| | - Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
3
|
Mischler G, Keshishian M, Bickel S, Mehta AD, Mesgarani N. Deep neural networks effectively model neural adaptation to changing background noise and suggest nonlinear noise filtering methods in auditory cortex. Neuroimage 2023; 266:119819. [PMID: 36529203 PMCID: PMC10510744 DOI: 10.1016/j.neuroimage.2022.119819] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/28/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
The human auditory system displays a robust capacity to adapt to sudden changes in background noise, allowing for continuous speech comprehension despite changes in background environments. However, despite comprehensive studies characterizing this ability, the computations that underly this process are not well understood. The first step towards understanding a complex system is to propose a suitable model, but the classical and easily interpreted model for the auditory system, the spectro-temporal receptive field (STRF), cannot match the nonlinear neural dynamics involved in noise adaptation. Here, we utilize a deep neural network (DNN) to model neural adaptation to noise, illustrating its effectiveness at reproducing the complex dynamics at the levels of both individual electrodes and the cortical population. By closely inspecting the model's STRF-like computations over time, we find that the model alters both the gain and shape of its receptive field when adapting to a sudden noise change. We show that the DNN model's gain changes allow it to perform adaptive gain control, while the spectro-temporal change creates noise filtering by altering the inhibitory region of the model's receptive field. Further, we find that models of electrodes in nonprimary auditory cortex also exhibit noise filtering changes in their excitatory regions, suggesting differences in noise filtering mechanisms along the cortical hierarchy. These findings demonstrate the capability of deep neural networks to model complex neural adaptation and offer new hypotheses about the computations the auditory cortex performs to enable noise-robust speech perception in real-world, dynamic environments.
Collapse
Affiliation(s)
- Gavin Mischler
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States
| | - Menoua Keshishian
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States
| | - Stephan Bickel
- Hofstra Northwell School of Medicine, Manhasset, New York, United States
| | - Ashesh D Mehta
- Hofstra Northwell School of Medicine, Manhasset, New York, United States
| | - Nima Mesgarani
- Mortimer B. Zuckerman Mind Brain Behavior, Columbia University, New York, United States; Department of Electrical Engineering, Columbia University, New York, United States.
| |
Collapse
|
4
|
Veyrié A, Noreña A, Sarrazin JC, Pezard L. Investigating the influence of masker and target properties on the dynamics of perceptual awareness under informational masking. PLoS One 2023; 18:e0282885. [PMID: 36928693 PMCID: PMC10019711 DOI: 10.1371/journal.pone.0282885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 02/27/2023] [Indexed: 03/18/2023] Open
Abstract
Informational masking has been investigated using the detection of an auditory target embedded in a random multi-tone masker. The build-up of the target percept is influenced by the masker and target properties. Most studies dealing with discrimination performance neglect the dynamics of perceptual awareness. This study aims at investigating the dynamics of perceptual awareness using multi-level survival models in an informational masking paradigm by manipulating masker uncertainty, masker-target similarity and target repetition rate. Consistent with previous studies, it shows that high target repetition rates, low masker-target similarity and low masker uncertainty facilitate target detection. In the context of evidence accumulation models, these results can be interpreted by changes in the accumulation parameters. The probabilistic description of perceptual awareness provides a benchmark for the choice of target and masker parameters in order to examine the underlying cognitive and neural dynamics of perceptual awareness.
Collapse
Affiliation(s)
- Alexandre Veyrié
- Aix-Marseille Université, LNC, CNRS UMR 7291, Marseille, France
- ONERA, The French Aerospace Lab, Salon de Provence, France
| | - Arnaud Noreña
- Aix-Marseille Université, LNC, CNRS UMR 7291, Marseille, France
| | | | - Laurent Pezard
- Aix-Marseille Université, LNC, CNRS UMR 7291, Marseille, France
- * E-mail:
| |
Collapse
|
5
|
Kane SG, Dean KM, Buss E. Speech-in-Speech Recognition and Spatially Selective Attention in Children and Adults. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3617-3626. [PMID: 34403280 PMCID: PMC8642097 DOI: 10.1044/2021_jslhr-21-00108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 05/06/2021] [Accepted: 05/11/2021] [Indexed: 06/13/2023]
Abstract
Purpose Knowing target location can improve adults' speech-in-speech recognition in complex auditory environments, but it is unknown whether young children listen selectively in space. This study evaluated masked word recognition with and without a pretrial cue to location to characterize the influence of listener age and masker type on the benefit of spatial cues. Method Participants were children (5-13 years of age) and adults with normal hearing. Testing occurred in a 180° arc of 11 loudspeakers. Targets were spondees produced by a female talker and presented from a randomly selected loudspeaker; that location was either known, based on a pretrial cue, or unknown. Maskers were two sequences comprising spondees or speech-shaped noise bursts, each presented from a random loudspeaker. Speech maskers were produced by one male talker or by three talkers, two male and one female. Results Children and adults benefited from the pretrial cue to target location with the three-voice masker, and the magnitude of benefit increased with increasing child age. There was no benefit of location cues in the one-voice or noise-burst maskers. Incorrect responses in the three-voice masker tended to correspond to masker words produced by the female talker, and in the location-known condition, those masker intrusions were more likely near the cued loudspeaker for both age groups. Conclusions Increasing benefit of the location cue with increasing child age in the three-voice masker suggests maturation of spatially selective attention, but error patterns do not support this idea. Differences in performance in the location-unknown condition could play a role in the differential benefit of the location cue.
Collapse
Affiliation(s)
- Stacey G. Kane
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill
| | - Kelly M. Dean
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill
| |
Collapse
|
6
|
Liu JS, Liu YW, Yu YF, Galvin JJ, Fu QJ, Tao DD. Segregation of competing speech in adults and children with normal hearing and in children with cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:339. [PMID: 34340485 DOI: 10.1121/10.0005597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 06/22/2021] [Indexed: 06/13/2023]
Abstract
Children with normal hearing (CNH) have greater difficulty segregating competing speech than do adults with normal hearing (ANH). Children with cochlear implants (CCI) have greater difficulty segregating competing speech than do CNH. In the present study, speech reception thresholds (SRTs) in competing speech were measured in Chinese Mandarin-speaking ANH, CNH, and CCIs. Target sentences were produced by a male Mandarin-speaking talker. Maskers were time-forward or -reversed sentences produced by a native Mandarin-speaking male (different from the target) or female or a non-native English-speaking male. The SRTs were lowest (best) for the ANH group, followed by the CNH and CCI groups. The masking release (MR) was comparable between the ANH and CNH group, but much poorer in the CCI group. The temporal properties differed between the native and non-native maskers and between forward and reversed speech. The temporal properties of the maskers were significantly associated with the SRTs for the CCI and CNH groups but not for the ANH group. Whereas the temporal properties of the maskers were significantly associated with the MR for all three groups, the association was stronger for the CCI and CNH groups than for the ANH group.
Collapse
Affiliation(s)
- Ji-Sheng Liu
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou 215006, China
| | - Yang-Wenyi Liu
- Department of Otology and Skull Base Surgery, Eye Ear Nose and Throat Hospital, Fudan University, Shanghai 200031, China
| | - Ya-Feng Yu
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou 215006, China
| | - John J Galvin
- House Ear Institute, Los Angeles, California 90057, USA
| | - Qian-Jie Fu
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, California 90095, USA
| | - Duo-Duo Tao
- Department of Ear, Nose, and Throat, The First Affiliated Hospital of Soochow University, Suzhou 215006, China
| |
Collapse
|
7
|
Conroy C, Kidd G. Informational masking in the modulation domain. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3665. [PMID: 34241144 PMCID: PMC8163511 DOI: 10.1121/10.0005038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Uncertainty regarding the frequency spectrum of a masker can have an adverse effect on the ability to focus selective attention on a target frequency channel, yielding informational masking (IM). This study sought to determine if uncertainty regarding the modulation spectrum of a masker can have an analogous adverse effect on the ability to focus selective attention on a target modulation channel, yielding IM in the modulation domain, or "modulation IM." A single-interval, two-alternative forced-choice (yes-no) procedure was used. The task was to detect 32-Hz target sinusoidal amplitude modulation (SAM) imposed on a broadband-noise carrier in the presence of masker SAM imposed on the same carrier. Six maskers, spanning the range from 8 to 128 Hz in half-octave steps, were tested, excluding those that fell within a two-octave protected zone surrounding the target. Psychometric functions (d'-vs-target modulation depth) were measured for each masker under two conditions: a fixed (low-uncertainty/low-IM) condition, in which the masker was the same on all trials within a block, and a random (high-uncertainty/high-IM) condition, in which it varied randomly from presentation-to-presentation. Thresholds and slopes extracted from the psychometric functions differed markedly between the conditions. These results are consistent with the idea that IM occurs in the modulation domain.
Collapse
Affiliation(s)
- Christopher Conroy
- Department of Speech, Language & Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Gerald Kidd
- Department of Speech, Language & Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
8
|
Roverud E, Dubno JR, Kidd G. Hearing-Impaired Listeners Show Reduced Attention to High-Frequency Information in the Presence of Low-Frequency Information. Trends Hear 2020; 24:2331216520945516. [PMID: 32853117 PMCID: PMC7557677 DOI: 10.1177/2331216520945516] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 06/30/2020] [Accepted: 07/07/2020] [Indexed: 11/29/2022] Open
Abstract
Many listeners with sensorineural hearing loss have uneven hearing sensitivity across frequencies. This study addressed whether this uneven hearing loss leads to a biasing of attention to different frequency regions. Normal-hearing (NH) and hearing-impaired (HI) listeners performed a pattern discrimination task at two distant center frequencies (CFs): 750 and 3500 Hz. The patterns were sequences of pure tones in which each successive tonal element was randomly selected from one of two possible frequencies surrounding a CF. The stimuli were presented at equal sensation levels to ensure equal audibility. In addition, the frequency separation of the tonal elements within a pattern was adjusted for each listener so that equal pattern discrimination performance was obtained for each CF in quiet. After these adjustments, the pattern discrimination task was performed under conditions in which independent patterns were presented at both CFs simultaneously. The listeners were instructed to attend to the low or high CF before the stimulus (assessing selective attention to frequency with instruction) or after the stimulus (divided attention, assessing inherent frequency biases). NH listeners demonstrated approximately equal performance decrements (re: quiet) between the two CFs. HI listeners demonstrated much larger performance decrements at the 3500 Hz CF than at the 750 Hz CF in combined-presentation conditions for both selective and divided attention conditions, indicating a low-frequency attentional bias that is apparently not under subject control. Surprisingly, the magnitude of this frequency bias was not related to the degree of asymmetry in thresholds at the two CFs.
Collapse
Affiliation(s)
- Elin Roverud
- Department of Speech, Language & Hearing Sciences, Boston University
| | - Judy R. Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina
| | - Gerald Kidd
- Department of Speech, Language & Hearing Sciences, Boston University
| |
Collapse
|
9
|
Morse-Fortier C, Parrish MM, Baran JA, Freyman RL. The Effects of Musical Training on Speech Detection in the Presence of Informational and Energetic Masking. Trends Hear 2019; 21:2331216517739427. [PMID: 29161982 PMCID: PMC5703091 DOI: 10.1177/2331216517739427] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Recent research has suggested that musicians have an advantage in some speech-in-noise paradigms, but not all. Whether musicians outperform nonmusicians on a given speech-in-noise task may well depend on the type of noise involved. To date, few groups have specifically studied the role that informational masking plays in the observation of a musician advantage. The current study investigated the effect of musicianship on listeners’ ability to overcome informational versus energetic masking of speech. Monosyllabic words were presented in four conditions that created similar energetic masking but either high or low informational masking. Two of these conditions used noise-vocoded target and masking stimuli to determine whether the absence of natural fine structure and spectral variations influenced any musician advantage. Forty young normal-hearing listeners (20 musicians and 20 nonmusicians) completed the study. There was a significant overall effect of participant group collapsing across the four conditions; however, planned comparisons showed musicians’ thresholds were only significantly better in the high informational masking natural speech condition, where the musician advantage was approximately 3 dB. These results add to the mounting evidence that informational masking plays a role in the presence and amount of musician benefit.
Collapse
Affiliation(s)
| | - Mary M Parrish
- 1 Department of Communication Disorders, University of Massachusetts Amherst, MA, USA
| | - Jane A Baran
- 1 Department of Communication Disorders, University of Massachusetts Amherst, MA, USA
| | - Richard L Freyman
- 1 Department of Communication Disorders, University of Massachusetts Amherst, MA, USA
| |
Collapse
|
10
|
Kidd G, Mason CR, Best V, Roverud E, Swaminathan J, Jennings T, Clayton K, Steven Colburn H. Determining the energetic and informational components of speech-on-speech masking in listeners with sensorineural hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:440. [PMID: 30710924 PMCID: PMC6347574 DOI: 10.1121/1.5087555] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 11/19/2018] [Accepted: 12/18/2018] [Indexed: 05/20/2023]
Abstract
The ability to identify the words spoken by one talker masked by two or four competing talkers was tested in young-adult listeners with sensorineural hearing loss (SNHL). In a reference/baseline condition, masking speech was colocated with target speech, target and masker talkers were female, and the masker was intelligible. Three comparison conditions included replacing female masker talkers with males, time-reversal of masker speech, and spatial separation of sources. All three variables produced significant release from masking. To emulate energetic masking (EM), stimuli were subjected to ideal time-frequency segregation retaining only the time-frequency units where target energy exceeded masker energy. Subjects were then tested with these resynthesized "glimpsed stimuli." For either two or four maskers, thresholds only varied about 3 dB across conditions suggesting that EM was roughly equal. Compared to normal-hearing listeners from an earlier study [Kidd, Mason, Swaminathan, Roverud, Clayton, and Best, J. Acoust. Soc. Am. 140, 132-144 (2016)], SNHL listeners demonstrated both greater energetic and informational masking as well as higher glimpsed thresholds. Individual differences were correlated across masking release conditions suggesting that listeners could be categorized according to their general ability to solve the task. Overall, both peripheral and central factors appear to contribute to the higher thresholds for SNHL listeners.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Elin Roverud
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Jayaganesh Swaminathan
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Todd Jennings
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Kameron Clayton
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - H Steven Colburn
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
11
|
Durai M, Kobayashi K, Searchfield GD. A feasibility study of predictable and unpredictable surf-like sounds for tinnitus therapy using personal music players. Int J Audiol 2018; 57:707-713. [PMID: 29806782 DOI: 10.1080/14992027.2018.1476783] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
OBJECTIVE To evaluate the feasibility of predictable or unpredictable amplitude-modulated sounds for tinnitus therapy. DESIGN The study consisted of two parts. (1) An adaptation experiment. Loudness level matches and rating scales (10-point) for loudness and distress were obtained at a silent baseline and at the end of three counterbalanced 30-min exposures (silence, predictable and unpredictable). (2) A qualitative 2-week sound therapy feasibility trial. Participants took home a personal music player (PMP). STUDY SAMPLE Part 1: 23 individuals with chronic tinnitus and part 2: seven individuals randomly selected from Part 1. RESULTS Self-reported tinnitus loudness and annoyance were significantly lower than baseline ratings after acute unpredictable sound exposure. Tinnitus annoyance ratings were also significantly lower than the baseline but the effect was small. The feasibility trial identified that participant preferences for sounds varied. Three participants did not obtain any benefit from either sound. Three participants preferred unpredictable compared to predictable sounds. Some participants had difficulty using the PMP, the average self-report hours of use were low (less <1 h/day). CONCLUSIONS Unpredictable surf-like sounds played using a PMP is a feasible tinnitus treatment. Further work is required to improve the acceptance of the sound and ease of PMP use.
Collapse
Affiliation(s)
- Mithila Durai
- a Section of Audiology , University of Auckland , Auckland , New Zealand
| | - Kei Kobayashi
- a Section of Audiology , University of Auckland , Auckland , New Zealand
| | | |
Collapse
|
12
|
Stilp CE, Kiefte M, Kluender KR. Discovering acoustic structure of novel sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2460. [PMID: 29716264 PMCID: PMC5924381 DOI: 10.1121/1.5031018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Revised: 03/24/2018] [Accepted: 03/26/2018] [Indexed: 06/08/2023]
Abstract
Natural sounds have substantial acoustic structure (predictability, nonrandomness) in their spectral and temporal compositions. Listeners are expected to exploit this structure to distinguish simultaneous sound sources; however, previous studies confounded acoustic structure and listening experience. Here, sensitivity to acoustic structure in novel sounds was measured in discrimination and identification tasks. Complementary signal-processing strategies independently varied relative acoustic entropy (the inverse of acoustic structure) across frequency or time. In one condition, instantaneous frequency of low-pass-filtered 300-ms random noise was rescaled to 5 kHz bandwidth and resynthesized. In another condition, the instantaneous frequency of a short gated 5-kHz noise was resampled up to 300 ms. In both cases, entropy relative to full bandwidth or full duration was a fraction of that in 300-ms noise sampled at 10 kHz. Discrimination of sounds improved with less relative entropy. Listeners identified a probe sound as a target sound (1%, 3.2%, or 10% relative entropy) that repeated amidst distractor sounds (1%, 10%, or 100% relative entropy) at 0 dB SNR. Performance depended on differences in relative entropy between targets and background. Lower-relative-entropy targets were better identified against higher-relative-entropy distractors than lower-relative-entropy distractors; higher-relative-entropy targets were better identified amidst lower-relative-entropy distractors. Results were consistent across signal-processing strategies.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, 317 Life Sciences Building, Louisville, Kentucky 40292, USA
| | - Michael Kiefte
- School of Communication Sciences and Disorders, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Keith R Kluender
- Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
13
|
Abstract
Two hypotheses, attentional prioritization and attentional spreading, have been proposed to account for object-based attention. The attentional-prioritization hypothesis posits that the positional uncertainty of targets is sufficient to resolve the controversy raised by the competing attentional-spreading hypothesis. Here we challenge the sufficiency of this explanation by showing that object-based attention is a function of sensory uncertainty in a task with consistent high positional uncertainty of the targets. In Experiment 1, object-based attention was modulated by sensory uncertainty induced by the noise from backward masking, showing an object-based effect under high as compared to low sensory uncertainty. This finding was replicated in Experiment 2 with increased task difficulty, to exclude that as a confounding factor, and in Experiment 3 with a psychophysical method, to obtain converging evidence using perceptual threshold measurement. Additionally, such a finding was not observed when sensory uncertainty was eliminated by replacing the backward-masking stimuli with perceptually dissimilar ones in Experiment 4. These results reveal that object-based attention is influenced by sensory uncertainty, even under high positional uncertainty of the targets. Our findings contradict the proposition of attentional spreading, proposing instead an automatic form of object-based attention due to enhancement of the perceptual representation. More importantly, the attentional-prioritization hypothesis based solely on positional uncertainty cannot sufficiently account for object-based attention, but needs to be developed by expanding the concept of uncertainty to include at least sensory uncertainty.
Collapse
|
14
|
Hausfeld L, Gutschalk A, Formisano E, Riecke L. Effects of Cross-modal Asynchrony on Informational Masking in Human Cortex. J Cogn Neurosci 2017; 29:980-990. [DOI: 10.1162/jocn_a_01097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
In many everyday listening situations, an otherwise audible sound may go unnoticed amid multiple other sounds. This auditory phenomenon, called informational masking (IM), is sensitive to visual input and involves early (50–250 msec) activity in the auditory cortex (the so-called awareness-related negativity). It is still unclear whether and how the timing of visual input influences the neural correlates of IM in auditory cortex. To address this question, we obtained simultaneous behavioral and neural measures of IM from human listeners in the presence of a visual input stream and varied the asynchrony between the visual stream and the rhythmic auditory target stream (in-phase, antiphase, or random). Results show effects of cross-modal asynchrony on both target detectability (RT and sensitivity) and the awareness-related negativity measured with EEG, which were driven primarily by antiphasic audiovisual stimuli. The neural effect was limited to the interval shortly before listeners' behavioral report of the target. Our results indicate that the relative timing of visual input can influence the IM of a target sound in the human auditory cortex. They further show that this audiovisual influence occurs early during the perceptual buildup of the target sound. In summary, these findings provide novel insights into the interaction of IM and multisensory interaction in the human brain.
Collapse
|
15
|
Snyder JS, Elhilali M. Recent advances in exploring the neural underpinnings of auditory scene perception. Ann N Y Acad Sci 2017; 1396:39-55. [PMID: 28199022 PMCID: PMC5446279 DOI: 10.1111/nyas.13317] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 12/21/2016] [Accepted: 01/08/2017] [Indexed: 11/29/2022]
Abstract
Studies of auditory scene analysis have traditionally relied on paradigms using artificial sounds-and conventional behavioral techniques-to elucidate how we perceptually segregate auditory objects or streams from each other. In the past few decades, however, there has been growing interest in uncovering the neural underpinnings of auditory segregation using human and animal neuroscience techniques, as well as computational modeling. This largely reflects the growth in the fields of cognitive neuroscience and computational neuroscience and has led to new theories of how the auditory system segregates sounds in complex arrays. The current review focuses on neural and computational studies of auditory scene perception published in the last few years. Following the progress that has been made in these studies, we describe (1) theoretical advances in our understanding of the most well-studied aspects of auditory scene perception, namely segregation of sequential patterns of sounds and concurrently presented sounds; (2) the diversification of topics and paradigms that have been investigated; and (3) how new neuroscience techniques (including invasive neurophysiology in awake humans, genotyping, and brain stimulation) have been used in this field.
Collapse
Affiliation(s)
- Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, Nevada
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
16
|
Durai M, Searchfield GD. A Mixed-Methods Trial of Broad Band Noise and Nature Sounds for Tinnitus Therapy: Group and Individual Responses Modeled under the Adaptation Level Theory of Tinnitus. Front Aging Neurosci 2017; 9:44. [PMID: 28337139 PMCID: PMC5343046 DOI: 10.3389/fnagi.2017.00044] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 02/20/2017] [Indexed: 12/12/2022] Open
Abstract
Objectives: A randomized cross-over trial in 18 participants tested the hypothesis that nature sounds, with unpredictable temporal characteristics and high valence would yield greater improvement in tinnitus than constant, emotionally neutral broadband noise. Study Design: The primary outcome measure was the Tinnitus Functional Index (TFI). Secondary measures were: loudness and annoyance ratings, loudness level matches, minimum masking levels, positive and negative emotionality, attention reaction and discrimination time, anxiety, depression and stress. Each sound was administered using MP3 players with earbuds for 8 continuous weeks, with a 3 week wash-out period before crossing over to the other treatment sound. Measurements were undertaken for each arm at sound fitting, 4 and 8 weeks after administration. Qualitative interviews were conducted at each of these appointments. Results: From a baseline TFI score of 41.3, sound therapy resulted in TFI scores at 8 weeks of 35.6; broadband noise resulted in significantly greater reduction (8.2 points) after 8 weeks of sound therapy use than nature sounds (3.2 points). The positive effect of sound on tinnitus was supported by secondary outcome measures of tinnitus, emotion, attention, and psychological state, but not interviews. Tinnitus loudness level match was higher for BBN at 8 weeks; while there was little change in loudness level matches for nature sounds. There was no change in minimum masking levels following sound therapy administration. Self-reported preference for one sound over another did not correlate with changes in tinnitus. Conclusions: Modeled under an adaptation level theory framework of tinnitus perception, the results indicate that the introduction of broadband noise shifts internal adaptation level weighting away from the tinnitus signal, reducing tinnitus magnitude. Nature sounds may modify the affective components of tinnitus via a secondary, residual pathway, but this appears to be less important for sound effectiveness. The different rates of adaptation to broadband noise and nature sound by the auditory system may explain the different tinnitus loudness level matches. In addition to group effects there also appears to be a great deal of individual variation. A sound therapy framework based on adaptation level theory is proposed that accounts for individual variation in preference and response to sound. Clinical Trial Registration: www.anzctr.org.au, identifier #12616000742471.
Collapse
Affiliation(s)
- Mithila Durai
- Eisdell Moore Centre, Section of Audiology, University of AucklandAuckland, New Zealand
- Center for Brain Research, University of AucklandAuckland, New Zealand
| | - Grant D. Searchfield
- Eisdell Moore Centre, Section of Audiology, University of AucklandAuckland, New Zealand
- Center for Brain Research, University of AucklandAuckland, New Zealand
- Brain Research New ZealandAuckland, New Zealand
| |
Collapse
|
17
|
Leibold LJ, Buss E. Factors responsible for remote-frequency masking in children and adults. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:4367. [PMID: 28040030 PMCID: PMC5392082 DOI: 10.1121/1.4971780] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 10/25/2016] [Accepted: 11/23/2016] [Indexed: 05/29/2023]
Abstract
Susceptibility to remote-frequency masking in children and adults was evaluated with respect to three stimulus features: (1) masker bandwidth, (2) spectral separation of the signal and masker, and (3) gated versus continuous masker presentation. Listeners were 4- to 6-year-olds, 7- to 10-year-olds, and adults. Detection thresholds for a 500-ms, 2000-Hz signal were estimated in quiet or presented with a band of noise in one of four frequency regions: 425-500 Hz, 4000-4075 Hz, 8000-8075 Hz, or 4000-10 000 Hz. In experiment 1, maskers were gated on in each 500-ms interval of a three-interval, forced-choice adaptive procedure. Masking was observed for all ages in all maskers, but the greatest masking was observed for the 4000-4075 Hz masker. These findings suggest that signal/masker spectral proximity plays an important role in remote-frequency masking, even when peripheral excitation associated with the signal and masker does not overlap. Younger children tended to have more masking than older children or adults, consistent with a reduced ability to segregate simultaneous sounds and/or listen in a frequency-selective manner. In experiment 2, detection thresholds were estimated in the same noises, but maskers were presented continuously. Masking was reduced for all ages relative to gated conditions, suggesting improved segregation and/or frequency-selective listening.
Collapse
Affiliation(s)
- Lori J Leibold
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
18
|
Chang AC, Lutfi R, Lee J, Heo I. A Detection-Theoretic Analysis of Auditory Streaming and Its Relation to Auditory Masking. Trends Hear 2016; 20:20/0/2331216516664343. [PMID: 27641681 PMCID: PMC5029798 DOI: 10.1177/2331216516664343] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Research on hearing has long been challenged with understanding our exceptional ability to hear out individual sounds in a mixture (the so-called cocktail party problem). Two general approaches to the problem have been taken using sequences of tones as stimuli. The first has focused on our tendency to hear sequences, sufficiently separated in frequency, split into separate cohesive streams (auditory streaming). The second has focused on our ability to detect a change in one sequence, ignoring all others (auditory masking). The two phenomena are clearly related, but that relation has never been evaluated analytically. This article offers a detection-theoretic analysis of the relation between multitone streaming and masking that underscores the expected similarities and differences between these phenomena and the predicted outcome of experiments in each case. The key to establishing this relation is the function linking performance to the information divergence of the tone sequences, DKL (a measure of the statistical separation of their parameters). A strong prediction is that streaming and masking of tones will be a common function of DKL provided that the statistical properties of sequences are symmetric. Results of experiments are reported supporting this prediction.
Collapse
Affiliation(s)
- An-Chieh Chang
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, WI, USA
| | - Robert Lutfi
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, WI, USA
| | - Jungmee Lee
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, WI, USA
| | - Inseok Heo
- Department of Electrical and Computer Engineering, University of Wisconsin-Madison, WI, USA
| |
Collapse
|
19
|
Ross B, Fujioka T. 40-Hz oscillations underlying perceptual binding in young and older adults. Psychophysiology 2016; 53:974-90. [PMID: 27080577 DOI: 10.1111/psyp.12654] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 03/12/2016] [Accepted: 03/13/2016] [Indexed: 11/29/2022]
Abstract
Auditory object perception requires binding of elementary features of complex stimuli. Synchronization of high-frequency oscillation in neural networks has been proposed as an effective alternative to binding via hard-wired connections because binding in an oscillatory network can be dynamically adjusted to the ever-changing sensory environment. Previously, we demonstrated in young adults that gamma oscillations are critical for sensory integration and found that they were affected by concurrent noise. Here, we aimed to support the hypothesis that stimulus evoked auditory 40-Hz responses are a component of thalamocortical gamma oscillations and examined whether this oscillatory system may become less effective in aging. In young and older adults, we recorded neuromagnetic 40-Hz oscillations, elicited by monaural amplitude-modulated sound. Comparing responses in quiet and under contralateral masking with multitalker babble noise revealed two functionally distinct components of auditory 40-Hz responses. The first component followed changes in the auditory input with high fidelity and was of similar amplitude in young and older adults. The second, significantly smaller in older adults, showed a 200-ms interval of amplitude and phase rebound and was strongly attenuated by contralateral noise. The amplitude of the second component was correlated with behavioral speech-in-noise performance. Concurrent noise also reduced the P2 wave of auditory evoked responses at 200-ms latency, but not the earlier N1 wave. P2 modulation was reduced in older adults. The results support the model of sensory binding through thalamocortical gamma oscillations. Limitation of neural resources for this process in older adults may contribute to their speech-in-noise understanding deficits.
Collapse
Affiliation(s)
- Bernhard Ross
- Rotman Research Institute, Baycrest Centre, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Takako Fujioka
- Center for Computer Research in Music and Acoustics, Department of Music, Stanford University, Stanford, California, USA.,Neurosciences Institute, Stanford University, Stanford, California, USA
| |
Collapse
|
20
|
Roverud E, Best V, Mason CR, Swaminathan J, Kidd G. Informational Masking in Normal-Hearing and Hearing-Impaired Listeners Measured in a Nonspeech Pattern Identification Task. Trends Hear 2016; 20:2331216516638516. [PMID: 27059627 PMCID: PMC4871212 DOI: 10.1177/2331216516638516] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Revised: 01/26/2016] [Accepted: 02/16/2016] [Indexed: 11/16/2022] Open
Abstract
Individuals with sensorineural hearing loss (SNHL) often experience more difficulty with listening in multisource environments than do normal-hearing (NH) listeners. While the peripheral effects of sensorineural hearing loss certainly contribute to this difficulty, differences in central processing of auditory information may also contribute. To explore this issue, it is important to account for peripheral differences between NH and these hearing-impaired (HI) listeners so that central effects in multisource listening can be examined. In the present study, NH and HI listeners performed a tonal pattern identification task at two distant center frequencies (CFs), 850 and 3500 Hz. In an attempt to control for differences in the peripheral representations of the stimuli, the patterns were presented at the same sensation level (15 dB SL), and the frequency deviation of the tones comprising the patterns was adjusted to obtain equal quiet pattern identification performance across all listeners at both CFs. Tonal sequences were then presented at both CFs simultaneously (informational masking conditions), and listeners were asked either to selectively attend to a source (CF) or to divide attention between CFs and identify the pattern at a CF designated after each trial. There were large differences between groups in the frequency deviations necessary to perform the pattern identification task. After compensating for these differences, there were small differences between NH and HI listeners in the informational masking conditions. HI listeners showed slightly greater performance asymmetry between the low and high CFs than did NH listeners, possibly due to central differences in frequency weighting between groups.
Collapse
|
21
|
McCloy DR, Lee AKC. Auditory attention strategy depends on target linguistic properties and spatial configuration. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:97-114. [PMID: 26233011 PMCID: PMC4499044 DOI: 10.1121/1.4922328] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Revised: 05/15/2015] [Accepted: 05/28/2015] [Indexed: 05/27/2023]
Abstract
Whether crossing a busy intersection or attending a large dinner party, listeners sometimes need to attend to multiple spatially distributed sound sources or streams concurrently. How they achieve this is not clear-some studies suggest that listeners cannot truly simultaneously attend to separate streams, but instead combine attention switching with short-term memory to achieve something resembling divided attention. This paper presents two oddball detection experiments designed to investigate whether directing attention to phonetic versus semantic properties of the attended speech impacts listeners' ability to divide their auditory attention across spatial locations. Each experiment uses four spatially distinct streams of monosyllabic words, variation in cue type (providing phonetic or semantic information), and requiring attention to one or two locations. A rapid button-press response paradigm is employed to minimize the role of short-term memory in performing the task. Results show that differences in the spatial configuration of attended and unattended streams interact with linguistic properties of the speech streams to impact performance. Additionally, listeners may leverage phonetic information to make oddball detection judgments even when oddballs are semantically defined. Both of these effects appear to be mediated by the overall complexity of the acoustic scene.
Collapse
Affiliation(s)
- Daniel R McCloy
- Department of Speech and Hearing Sciences and Institute for Learning and Brain Sciences, University of Washington, 1715 NE Columbia Road, Box 357988, Seattle, Washington 98195-7988, USA
| | - Adrian K C Lee
- Department of Speech and Hearing Sciences and Institute for Learning and Brain Sciences, University of Washington, 1715 NE Columbia Road, Box 357988, Seattle, Washington 98195-7988, USA
| |
Collapse
|
22
|
Bohlen P, Dylla M, Timms C, Ramachandran R. Detection of modulated tones in modulated noise by non-human primates. J Assoc Res Otolaryngol 2014; 15:801-21. [PMID: 24899380 DOI: 10.1007/s10162-014-0467-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 05/08/2014] [Indexed: 10/25/2022] Open
Abstract
In natural environments, many sounds are amplitude-modulated. Amplitude modulation is thought to be a signal that aids auditory object formation. A previous study of the detection of signals in noise found that when tones or noise were amplitude-modulated, the noise was a less effective masker, and detection thresholds for tones in noise were lowered. These results suggest that the detection of modulated signals in modulated noise would be enhanced. This paper describes the results of experiments investigating how detection is modified when both signal and noise were amplitude-modulated. Two monkeys (Macaca mulatta) were trained to detect amplitude-modulated tones in continuous, amplitude-modulated broadband noise. When the phase difference of otherwise similarly amplitude-modulated tones and noise were varied, detection thresholds were highest when the modulations were in phase and lowest when the modulations were anti-phase. When the depth of the modulation of tones or noise was varied, detection thresholds decreased if the modulations were anti-phase. When the modulations were in phase, increasing the depth of tone modulation caused an increase in tone detection thresholds, but increasing depth of noise modulations did not affect tone detection thresholds. Changing the modulation frequency of tone or noise caused changes in threshold that saturated at modulation frequencies higher than 20 Hz; thresholds decreased when the tone and noise modulations were in phase and decreased when they were anti-phase. The relationship between reaction times and tone level were not modified by manipulations to the nature of temporal variations in the signal or noise. The changes in behavioral threshold were consistent with a model where the brain subtracted noise from signal. These results suggest that the parameters of the modulation of signals and maskers heavily influence detection in very predictable ways. These results are consistent with some results in humans and avians and form the baseline for neurophysiological studies of mechanisms of detection in noise.
Collapse
Affiliation(s)
- Peter Bohlen
- Department of Hearing and Speech Sciences, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA,
| | | | | | | |
Collapse
|
23
|
Kidd G, Mason CR, Best V. The role of syntax in maintaining the integrity of streams of speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:766-77. [PMID: 25234885 PMCID: PMC3986016 DOI: 10.1121/1.4861354] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Revised: 12/13/2013] [Accepted: 12/23/2013] [Indexed: 05/21/2023]
Abstract
This study examined the ability of listeners to utilize syntactic structure to extract a target stream of speech from among competing sounds. Target talkers were identified by voice or location, which was held constant throughout a test utterance, and paired with correct or incorrect (random word order) target sentence syntax. Both voice and location provided reliable cues for identifying target speech even when other features varied unpredictably. The target sentences were masked either by predominantly energetic maskers (noise bursts) or by predominantly informational maskers (similar speech in random word order). When the maskers were noise bursts, target sentence syntax had relatively minor effects on identification performance. However, when the maskers were other talkers, correct target sentence syntax resulted in significantly better speech identification performance than incorrect syntax. Furthermore, conformance to correct syntax alone was sufficient to accurately identify the target speech. The results were interpreted as supporting the idea that the predictability of the elements comprising streams of speech, as manifested by syntactic structure, is an important factor in binding words together into coherent streams. Furthermore, these findings suggest that predictability is particularly important for maintaining the coherence of an auditory stream over time under conditions high in informational masking.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215
| | - Virginia Best
- National Acoustic Laboratories, Macquarie University, New South Wales 2109, Australia
| |
Collapse
|
24
|
Ruggles DR, Oxenham AJ. Perceptual asymmetry induced by the auditory continuity illusion. J Exp Psychol Hum Percept Perform 2013; 40:908-14. [PMID: 24364709 DOI: 10.1037/a0035411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The challenges of daily communication require listeners to integrate both independent and complementary auditory information to form holistic auditory scenes. As part of this process listeners are thought to fill in missing information to create continuous perceptual streams, even when parts of messages are masked or obscured. One example of this filling-in process-the auditory continuity illusion-has been studied primarily using stimuli presented in isolation, leaving it unclear whether the illusion occurs in more complex situations with higher perceptual and attentional demands. In this study, young normal-hearing participants listened for long target tones, either real or illusory, in "clouds" of shorter masking tone and noise bursts with pseudorandom spectrotemporal locations. Patterns of detection suggest that illusory targets are salient within mixtures, although they do not produce the same level of performance as the real targets. The results suggest that the continuity illusion occurs in the presence of competing sounds and can be used to aid in the detection of partially obscured objects within complex auditory scenes.
Collapse
|
25
|
Lutfi RA, Gilbertson L, Heo I, Chang AC, Stamas J. The information-divergence hypothesis of informational masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:2160-70. [PMID: 23967946 PMCID: PMC3765281 DOI: 10.1121/1.4817875] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2013] [Revised: 07/15/2013] [Accepted: 07/19/2013] [Indexed: 06/01/2023]
Abstract
In recent years there has been growing interest in masking that cannot be attributed to interactions in the cochlea-so--called informational masking (IM). Similarity in the acoustic properties of target and masker and uncertainty regarding the masker are the two major factors identified with IM. These factors involve quite different manipulations of signals and are believed to entail fundamentally different processes resulting in IM. Here, however, evidence is presented that these factors affect IM through their mutual influence on a single factor-the information divergence of target and masker given by Simpson-Fitter's da [Lutfi et al. (2012). J. Acoust. Soc. Am. 132, EL109-113]. Four experiments are described involving multitone pattern discrimination, multi-talker word recognition, sound-source identification, and sound localization. In each case standard manipulations of masker uncertainty and target-masker similarity (including the covariation of target-masker frequencies) are found to have the same effect on performance provided they produce the same change in da. The function relating d(') performance to da, moreover, appears to be linear with constant slope across listeners. The overriding dependence of IM on da is taken to reflect a general principle of perception that exploits differences in the statistical structure of signals to separate figure from ground.
Collapse
Affiliation(s)
- Robert A Lutfi
- Auditory Behavioral Research Lab, Department of Communication Sciences and Disorders, University of Wisconsin, Madison, Wisconsin 53706, USA.
| | | | | | | | | |
Collapse
|
26
|
Kidd G, Mason CR, Streeter T, Thompson ER, Best V, Wakefield GH. Perceiving sequential dependencies in auditory streams. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:1215-1231. [PMID: 23927120 PMCID: PMC3745531 DOI: 10.1121/1.4812276] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Revised: 05/30/2013] [Accepted: 06/08/2013] [Indexed: 05/30/2023]
Abstract
This study examined the ability of human listeners to detect the presence and judge the strength of a statistical dependency among the elements comprising sequences of sounds. The statistical dependency was imposed by specifying transition matrices that determined the likelihood of occurrence of the sound elements. Markov chains were constructed from these transition matrices having states that were pure tones/noise bursts that varied along the stimulus dimensions of frequency and/or interaural time difference. Listeners reliably detected the presence of a statistical dependency in sequences of sounds varying along these stimulus dimensions. Furthermore, listeners were able to discriminate the relative strength of the dependency in pairs of successive sound sequences. Random variation along an irrelevant stimulus dimension had small but significant adverse effects on performance. A much greater decrement in performance was found when the sound sequences were concurrent. Likelihood ratios were computed based on the transition matrices to specify Ideal Observer performance for the experimental conditions. Preliminary modeling efforts were made based on degradations of Ideal Observer performance intended to represent human observer limitations. This experimental approach appears to be useful for examining auditory "stream" formation and maintenance over time based on the predictability of the constituent sound elements.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA.
| | | | | | | | | | | |
Collapse
|
27
|
Auditory enhancement of increments in spectral amplitude stems from more than one source. J Assoc Res Otolaryngol 2012; 13:693-702. [PMID: 22766695 DOI: 10.1007/s10162-012-0339-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2011] [Accepted: 06/13/2012] [Indexed: 10/28/2022] Open
Abstract
A component of a test sound consisting of simultaneous pure tones perceptually "pops out" if the test sound is preceded by a copy of itself with that component attenuated. Although this "enhancement" effect was initially thought to be purely monaural, it is also observable when the test sound and the precursor sound are presented contralaterally (i.e., to opposite ears). In experiment 1, we assessed the magnitude of ipsilateral and contralateral enhancement as a function of the time interval between the precursor and test sounds (10, 100, or 600 ms). The test sound, randomly transposed in frequency from trial to trial, was followed by a probe tone, either matched or mismatched in frequency to the test sound component which was the target of enhancement. Listeners' ability to discriminate matched probes from mismatched probes was taken as an index of enhancement magnitude. The results showed that enhancement decays more rapidly for ipsilateral than for contralateral precursors, suggesting that ipsilateral enhancement and contralateral enhancement stem from at least partly different sources. It could be hypothesized that, in experiment 1, contralateral precursors were effective only because they provided attentional cues about the target tone frequency. In experiment 2, this hypothesis was tested by presenting the probe tone before the precursor sound rather than after the test sound. Although the probe tone was then serving as a frequency cue, contralateral precursors were again found to produce enhancement. This indicates that contralateral enhancement cannot be explained by cuing alone and is a genuine sensory phenomenon.
Collapse
|
28
|
Searchfield GD, Kobayashi K, Sanders M. An adaptation level theory of tinnitus audibility. Front Syst Neurosci 2012; 6:46. [PMID: 22707935 PMCID: PMC3374480 DOI: 10.3389/fnsys.2012.00046] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 05/24/2012] [Indexed: 11/20/2022] Open
Abstract
Models of tinnitus suggest roles for auditory, attention, and emotional networks in tinnitus perception. A model of tinnitus audibility based on Helson’s (1964) adaptation level theory (ALT) is hypothesized to explain the relationship between tinnitus audibility, personality, memory, and attention. This theory attempts to describe how tinnitus audibility or detectability might change with experience and context. The basis of ALT and potential role of auditory scene analysis in tinnitus perception are discussed. The proposed psychoacoustic model lends itself to incorporation into existing neurophysiological models of tinnitus perception. It is hoped that the ALT hypothesis will allow for greater empirical investigation of factors influencing tinnitus perception, such as attention and tinnitus sound therapies.
Collapse
Affiliation(s)
- Grant D Searchfield
- Audiology Section and Centre for Brain Research, The University of Auckland Auckland, New Zealand
| | | | | |
Collapse
|
29
|
Wiegand K, Gutschalk A. Correlates of perceptual awareness in human primary auditory cortex revealed by an informational masking experiment. Neuroimage 2012; 61:62-9. [PMID: 22406354 DOI: 10.1016/j.neuroimage.2012.02.067] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Revised: 02/14/2012] [Accepted: 02/22/2012] [Indexed: 10/28/2022] Open
|
30
|
Kidd GR, Humes LE. Effects of age and hearing loss on the recognition of interrupted words in isolation and in sentences. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:1434-48. [PMID: 22352515 PMCID: PMC3292613 DOI: 10.1121/1.3675975] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
The ability to recognize spoken words interrupted by silence was investigated with young normal-hearing listeners and older listeners with and without hearing impairment. Target words from the revised SPIN test by Bilger et al. [J. Speech Hear. Res. 27(1), 32-48 (1984)] were presented in isolation and in the original sentence context using a range of interruption patterns in which portions of speech were replaced with silence. The number of auditory "glimpses" of speech and the glimpse proportion (total duration glimpsed/word duration) were varied using a subset of the SPIN target words that ranged in duration from 300 to 600 ms. The words were presented in isolation, in the context of low-predictability (LP) sentences, and in high-predictability (HP) sentences. The glimpse proportion was found to have a strong influence on word recognition, with relatively little influence of the number of glimpses, glimpse duration, or glimpse rate. Although older listeners tended to recognize fewer interrupted words, there was considerable overlap in recognition scores across listener groups in all conditions, and all groups were affected by interruption parameters and context in much the same way.
Collapse
Affiliation(s)
- Gary R Kidd
- Department of Speech and Hearing Sciences Indiana University Bloomington, Indiana 47405-7002, USA.
| | | |
Collapse
|
31
|
Leibold LJ. Development of Auditory Scene Analysis and Auditory Attention. HUMAN AUDITORY DEVELOPMENT 2012. [DOI: 10.1007/978-1-4614-1421-6_5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
32
|
Ross B, Miyazaki T, Fujioka T. Interference in dichotic listening: the effect of contralateral noise on oscillatory brain networks. Eur J Neurosci 2011; 35:106-18. [PMID: 22171970 DOI: 10.1111/j.1460-9568.2011.07935.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Coupling of thalamocortical networks through synchronous oscillations at gamma frequencies (30-80 Hz) has been suggested as a mechanism for binding of auditory sensory information into an object representation, which then becomes accessible for perception and cognition. This study investigated whether contralateral noise interferes with this step of central auditory processing. Neuromagnetic 40-Hz oscillations were examined in young healthy participants while they listened to amplitude-modulated sound in one ear and a multi-talker masking noise in the contralateral ear. Participants were engaged in a gap-detection task, for which their behavioural performance declined under masking. The amplitude modulation of the stimulus elicited steady 40-Hz oscillations with sources in bilateral auditory cortices. Analysis of the temporal dynamics of phase synchrony between source activity and the stimulus revealed two oscillatory components; the first was indicated by an instant onset in phase synchrony with the stimulus while the second showed a 200-ms time constant of gradual increase in phase synchrony after phase resetting by the gap. Masking abolished only the second component. This coincided with masking-related decrease of the P2 wave of the transient auditory-evoked responses whereas the N1 wave, reflecting early sensory processing, was unaffected. Given that the P2 response has been associated with object representation, we propose that the first 40-Hz component is related to representation of low-level sensory input whereas the second is related to internal auditory processing in thalamocortical networks. The observed modulation of oscillatory activity is discussed as reflecting a neural mechanism critical for speech understanding in noise.
Collapse
Affiliation(s)
- Bernhard Ross
- Rotman Research Institute, Baycrest Centre, Toronto, Ontario, Canada.
| | | | | |
Collapse
|
33
|
Kidd G, Richards VM, Streeter T, Mason CR, Huang R. Contextual effects in the identification of nonspeech auditory patterns. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:3926-38. [PMID: 22225048 PMCID: PMC3253596 DOI: 10.1121/1.3658442] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2011] [Revised: 10/05/2011] [Accepted: 10/07/2011] [Indexed: 05/31/2023]
Abstract
This study investigated the benefit of a priori cues in a masked nonspeech pattern identification experiment. Targets were narrowband sequences of tone bursts forming six easily identifiable frequency patterns selected randomly on each trial. The frequency band containing the target was randomized. Maskers were also narrowband sequences of tone bursts chosen randomly on every trial. Targets and maskers were presented monaurally in mutually exclusive frequency bands, producing large amounts of informational masking. Cuing the masker produced a significant improvement in performance, while holding the target frequency band constant provided no benefit. The cue providing the greatest benefit was a copy of the masker presented ipsilaterally before the target-plus-masker. The masker cue presented contralaterally, and a notched-noise cue produced smaller benefits. One possible mechanism underlying these findings is auditory "enhancement" in which the neural response to the target is increased relative to the masker by differential prior stimulation of the target and masker frequency regions. A second possible mechanism provides a benefit to performance by comparing the spectrotemporal correspondence of the cue and target-plus-masker and is effective for either ipsilateral or contralateral cue presentation. These effects improve identification performance by emphasizing spectral contrasts in sequences or streams of sounds.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA.
| | | | | | | | | |
Collapse
|
34
|
Klinge A, Beutelmann R, Klump GM. Effect of harmonicity on the detection of a signal in a complex masker and on spatial release from masking. PLoS One 2011; 6:e26124. [PMID: 22028814 PMCID: PMC3196535 DOI: 10.1371/journal.pone.0026124] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 09/20/2011] [Indexed: 11/19/2022] Open
Abstract
The amount of masking of sounds from one source (signals) by sounds from a competing source (maskers) heavily depends on the sound characteristics of the masker and the signal and on their relative spatial location. Numerous studies investigated the ability to detect a signal in a speech or a noise masker or the effect of spatial separation of signal and masker on the amount of masking, but there is a lack of studies investigating the combined effects of many cues on the masking as is typical for natural listening situations. The current study using free-field listening systematically evaluates the combined effects of harmonicity and inharmonicity cues in multi-tone maskers and cues resulting from spatial separation of target signal and masker on the detection of a pure tone in a multi-tone or a noise masker. A linear binaural processing model was implemented to predict the masked thresholds in order to estimate whether the observed thresholds can be accounted for by energetic masking in the auditory periphery or whether other effects are involved. Thresholds were determined for combinations of two target frequencies (1 and 8 kHz), two spatial configurations (masker and target either co-located or spatially separated by 90 degrees azimuth), and five different masker types (four complex multi-tone stimuli, one noise masker). A spatial separation of target and masker resulted in a release from masking for all masker types. The amount of masking significantly depended on the masker type and frequency range. The various harmonic and inharmonic relations between target and masker or between components of the masker resulted in a complex pattern of increased or decreased masked thresholds in comparison to the predicted energetic masking. The results indicate that harmonicity cues affect the detectability of a tonal target in a complex masker.
Collapse
Affiliation(s)
- Astrid Klinge
- Animal Physiology and Behavior Group, Department of Biology and Environmental Sciences, Carl-von-Ossietzky University Oldenburg, Oldenburg, Germany.
| | | | | |
Collapse
|
35
|
Lee TY, Richards VM. Evaluation of similarity effects in informational masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:EL280-EL285. [PMID: 21682365 PMCID: PMC3117891 DOI: 10.1121/1.3590168] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2011] [Accepted: 04/04/2011] [Indexed: 05/27/2023]
Abstract
The degree of similarity between signal and masker in informational masking paradigms has been hypothesized to contribute to informational masking. The present study attempted to quantify "similarity" using a discrimination task. Listeners discriminated various signal stimuli from a multitone complex and then detected the presence of those signals embedded in a multitone informational masker. Discriminability negatively correlated with detection threshold in an informational masking experiment, indicating that similarity between signal and the masker quality contributed to informational masking. These results suggest a method for specifying relevant signal attributes in informational masking paradigms involving similarity manipulations.
Collapse
Affiliation(s)
- Thomas Y Lee
- Department of Psychology, University of Pennsylvania, 3401 Walnut Street, Suite 302C, Philadelphia, Pennsylvania 19104, USA.
| | | |
Collapse
|
36
|
Objective and subjective psychophysical measures of auditory stream integration and segregation. J Assoc Res Otolaryngol 2010; 11:709-24. [PMID: 20658165 DOI: 10.1007/s10162-010-0227-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2009] [Accepted: 06/30/2010] [Indexed: 10/19/2022] Open
Abstract
The perceptual organization of sound sequences into auditory streams involves the integration of sounds into one stream and the segregation of sounds into separate streams. "Objective" psychophysical measures of auditory streaming can be obtained using behavioral tasks where performance is facilitated by segregation and hampered by integration, or vice versa. Traditionally, these two types of tasks have been tested in separate studies involving different listeners, procedures, and stimuli. Here, we tested subjects in two complementary temporal-gap discrimination tasks involving similar stimuli and procedures. One task was designed so that performance in it would be facilitated by perceptual integration; the other, so that performance would be facilitated by perceptual segregation. Thresholds were measured in both tasks under a wide range of conditions produced by varying three stimulus parameters known to influence stream formation: frequency separation, tone-presentation rate, and sequence length. In addition to these performance-based measures, subjective judgments of perceived segregation were collected in the same listeners under corresponding stimulus conditions. The patterns of results obtained in the two temporal-discrimination tasks, and the relationships between thresholds and perceived-segregation judgments, were mostly consistent with the hypothesis that stream segregation helped performance in one task and impaired performance in the other task. The tasks and stimuli described here may prove useful in future behavioral or neurophysiological experiments, which seek to manipulate and measure neural correlates of auditory streaming while minimizing differences between the physical stimuli.
Collapse
|
37
|
Shi LF, Law Y. Masking effects of speech and music: does the masker's hierarchical structure matter? Int J Audiol 2010; 49:296-308. [PMID: 20151877 DOI: 10.3109/14992020903350188] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Speech and music are time-varying signals organized by parallel hierarchical rules. Through a series of four experiments, this study compared the masking effects of single-talker speech and instrumental music on speech perception while manipulating the complexity of hierarchical and temporal structures of the maskers. Listeners' word recognition was found to be similar between hierarchically intact and disrupted speech or classical music maskers (Experiment 1). When sentences served as the signal, significantly greater masking effects were observed with disrupted than intact speech or classical music maskers (Experiment 2), although not with jazz or serial music maskers, which differed from the classical music masker in their hierarchical structures (Experiment 3). Removing the classical music masker's temporal dynamics or partially restoring it affected listeners' sentence recognition; yet, differences in performance between intact and disrupted maskers remained robust (Experiment 4). Hence, the effect of structural expectancy was largely present across maskers when comparing them before and after their hierarchical structure was purposefully disrupted. This effect seemed to lend support to the auditory stream segregation theory.
Collapse
Affiliation(s)
- Lu-Feng Shi
- Department of Communication Sciences and Disorders, Long Island University - Brooklyn Campus, New York 11201, USA.
| | | |
Collapse
|
38
|
Leibold LJ, Hitchens JJ, Buss E, Neff DL. Excitation-based and informational masking of a tonal signal in a four-tone masker. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:2441-50. [PMID: 20370027 PMCID: PMC2865701 DOI: 10.1121/1.3298588] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2009] [Revised: 12/31/2009] [Accepted: 01/05/2010] [Indexed: 05/29/2023]
Abstract
This study examined contributions of peripheral excitation and informational masking to the variability in masking effectiveness observed across samples of multi-tonal maskers. Detection thresholds were measured for a 1000-Hz signal presented simultaneously with each of 25, four-tone masker samples. Using a two-interval, forced-choice adaptive task, thresholds were measured with each sample fixed throughout trial blocks for ten listeners. Average thresholds differed by as much as 26 dB across samples. An excitation-based model of partial loudness [Moore, B. C. J. et al. (1997). J. Audio Eng. Soc. 45, 224-237] was used to predict thresholds. These predictions accounted for a significant portion of variance in the data of several listeners, but no relation between the model and data was observed for many listeners. Moreover, substantial individual differences, on the order of 41 dB, were observed for some maskers. The largest individual differences were found for maskers predicted to produce minimal excitation-based masking. In subsequent conditions, one of five maskers was randomly presented in each interval. The difference in performance for samples with low versus high predicted thresholds was reduced in random compared to fixed conditions. These findings are consistent with a trading relation whereby informational masking is largest for conditions in which excitation-based masking is smallest.
Collapse
Affiliation(s)
- Lori J Leibold
- Department of Allied Health Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| | | | | | | |
Collapse
|
39
|
Shi LF. Normal-hearing English-as-a-second-language listeners' recognition of English words in competing signals. Int J Audiol 2010; 48:260-70. [PMID: 19842801 DOI: 10.1080/14992020802607431] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
English-as-a-second-language (ESL) listeners have difficulty perceiving English speech presented in background noise. The current study furthered this line of investigations by including participants who varied widely in their age of English acquisition and length of English learning: 24 native English monolingual (EML), 12 simultaneous bilingual (SBL), 10 early ESL (E-ESL), and 14 late ESL (L-ESL) listeners. Word recognition scores were obtained in quiet and in the presence of speech-weighted noise, multi-talker babble, forward-playing music, and time-reversed music. All words and competing signals were presented at 45 dB HL. EML and SBL listeners' performances were found to be similar across test conditions. ESL, especially L-ESL listeners, performed significantly more poorly in all conditions than EML and SBL listeners. Overall, speech-weighted noise and multi-talker babble showed greater masking effect than music; however, the difference in performance between L-ESL and EML listeners was the largest for the music maskers, indicating that L-ESL listeners are susceptible to weaker maskers. Age of acquisition and length of learning were both shown to be good indicators of SBL and ESL listeners' performance.
Collapse
Affiliation(s)
- Lu-Feng Shi
- Department of Communication Sciences & Disorders, Long Island University-Brooklyn Campus, Brooklyn, New York 11201, USA.
| |
Collapse
|
40
|
Garadat SN, Litovsky RY, Yu G, Zeng FG. Role of binaural hearing in speech intelligibility and spatial release from masking using vocoded speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:2522-35. [PMID: 19894832 PMCID: PMC2787072 DOI: 10.1121/1.3238242] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
A cochlear implant vocoder was used to evaluate relative contributions of spectral and binaural temporal fine-structure cues to speech intelligibility. In Study I, stimuli were vocoded, and then convolved through head related transfer functions (HRTFs) to remove speech temporal fine structure but preserve the binaural temporal fine-structure cues. In Study II, the order of processing was reversed to remove both speech and binaural temporal fine-structure cues. Speech reception thresholds (SRTs) were measured adaptively in quiet, and with interfering speech, for unprocessed and vocoded speech (16, 8, and 4 frequency bands), under binaural or monaural (right-ear) conditions. Under binaural conditions, as the number of bands decreased, SRTs increased. With decreasing number of frequency bands, greater benefit from spatial separation of target and interferer was observed, especially in the 8-band condition. The present results demonstrate a strong role of the binaural cues in spectrally degraded speech, when the target and interfering speech are more likely to be confused. The nearly normal binaural benefits under present simulation conditions and the lack of order of processing effect further suggest that preservation of binaural cues is likely to improve performance in bilaterally implanted recipients.
Collapse
Affiliation(s)
- Soha N Garadat
- Waisman Center, University of Wisconsin, Madison, WI 53705, USA
| | | | | | | |
Collapse
|
41
|
Infants' listening in multitalker environments: effect of the number of background talkers. Atten Percept Psychophys 2009; 71:822-36. [PMID: 19429961 DOI: 10.3758/app.71.4.822] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Infants are often spoken to in the presence of background sounds, including speech from other talkers. In the present study, we compared 5- and 8.5-month-olds' abilities to recognize their own names in the context of three different types of background speech: that of a single talker, multitalker babble, and that of a single talker played backward. Infants recognized their names at a 10-dB signal-to-noise ratio in the multiple-voice condition but not in the single-voice (nonreversed) condition, a pattern opposite to that of typical adult performance. Infants similarly failed to recognize their names when the background talker's voice was reversed--that is, unintelligible, but with speech-like acoustic properties. These data suggest that infants may have difficulty segregating the components of different speech streams when those streams are acoustically too similar. Alternatively, infants' attention may be drawn to the time-varying acoustic properties associated with a single talker's speech, causing difficulties when a single talker is the competing sound.
Collapse
|
42
|
Leibold LJ, Bonino AY. Release from informational masking in children: effect of multiple signal bursts. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:2200-8. [PMID: 19354396 PMCID: PMC2736737 DOI: 10.1121/1.3087435] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2008] [Revised: 02/02/2009] [Accepted: 02/04/2009] [Indexed: 05/25/2023]
Abstract
This study examined the degree to which increasing the number of signal presentations provides children with a release from informational masking. Listeners were younger children (5-7 years), older children (8-10 years), and adults. Detection thresholds were measured for a sequence of repeating 50-ms bursts of a 1000-Hz pure-tone signal embedded in a sequence of 10- and 50-ms bursts of a random-frequency, two-tone masker. Masker bursts were played at an overall level of 60-dB sound pressure level in each interval of a two-interval, forced choice adaptive procedure. Performance was examined for conditions with two, four, five, and six signal bursts. Regardless of the number of signal bursts, thresholds for most children were higher than thresholds for most adults. Despite developmental effects in informational masking, however, masked threshold decreased with additional signal bursts by a similar amount for younger children, older children, and adults. The magnitude of masking release for both groups of children and for adults was inconsistent with absolute energy detection. Instead, increasing the number of signal bursts appears to aid children in the perceptual segregation of the fixed-frequency signal from the random-frequency masker as has been previously reported for adults [Kidd, G., et al. (2003). J. Acoust. Soc. Am. 114, 2835-2845].
Collapse
Affiliation(s)
- Lori J Leibold
- Department of Allied Health Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
| | | |
Collapse
|
43
|
Lu Y, Cooke M. Speech production modifications produced by competing talkers, babble, and stationary noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:3261-3275. [PMID: 19045809 DOI: 10.1121/1.2990705] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Noise has an effect on speech production. Stationary noise and babble have been used in the past but the effect of a competing talker, which might be expected to cause different types of disruption, has rarely been investigated. The current study examined the acoustic and phonetic consequences of N-talker noise on sentence production for a range of values of N from 1 (competing talker) to infinity (speech-shaped noise). The effect of noise on speech production increased with both the number of background talkers (N) and noise level, both of which act to increase the energetic masking effect of the noise. In a background of stationary noise, noise-induced speech was always more intelligible than speech produced in quiet, and the gain in intelligibility increased with N and noise level, suggesting that talkers modify their productions to ameliorate energetic masking at the ears of the listener. When presented in a competing talker background, speech induced by a competing talker was more intelligible than speech produced in quiet, but the scale of the effect was compatible with the energetic masking effect of the competing talker. No evidence was found of modifications to speech production which exploited the temporal structure of a competing talker.
Collapse
Affiliation(s)
- Youyi Lu
- Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, United Kingdom.
| | | |
Collapse
|
44
|
Bee MA, Micheyl C. The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? J Comp Psychol 2008; 122:235-51. [PMID: 18729652 PMCID: PMC2692487 DOI: 10.1037/0735-7036.122.3.235] [Citation(s) in RCA: 195] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Animals often use acoustic signals to communicate in groups or social aggregations in which multiple individuals signal within a receiver's hearing range. Consequently, receivers face challenges related to acoustic interference and auditory masking that are not unlike the human cocktail party problem, which refers to the problem of perceiving speech in noisy social settings. Understanding the sensory solutions to the cocktail party problem has been a goal of research on human hearing and speech communication for several decades. Despite a general interest in acoustic signaling in groups, animal behaviorists have devoted comparatively less attention toward understanding how animals solve problems equivalent to the human cocktail party problem. After illustrating how humans and nonhuman animals experience and overcome similar perceptual challenges in cocktail-party-like social environments, this article reviews previous psychophysical and physiological studies of humans and nonhuman animals to describe how the cocktail party problem can be solved. This review also outlines several basic and applied benefits that could result from studies of the cocktail party problem in the context of animal acoustic communication.
Collapse
Affiliation(s)
- Mark A Bee
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN 55108, USA.
| | | |
Collapse
|
45
|
Nahum M, Nelken I, Ahissar M. Low-level information and high-level perception: the case of speech in noise. PLoS Biol 2008; 6:e126. [PMID: 18494561 PMCID: PMC2386842 DOI: 10.1371/journal.pbio.0060126] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2007] [Accepted: 04/11/2008] [Indexed: 11/29/2022] Open
Abstract
Auditory information is processed in a fine-to-crude hierarchical scheme, from low-level acoustic information to high-level abstract representations, such as phonological labels. We now ask whether fine acoustic information, which is not retained at high levels, can still be used to extract speech from noise. Previous theories suggested either full availability of low-level information or availability that is limited by task difficulty. We propose a third alternative, based on the Reverse Hierarchy Theory (RHT), originally derived to describe the relations between the processing hierarchy and visual perception. RHT asserts that only the higher levels of the hierarchy are immediately available for perception. Direct access to low-level information requires specific conditions, and can be achieved only at the cost of concurrent comprehension. We tested the predictions of these three views in a series of experiments in which we measured the benefits from utilizing low-level binaural information for speech perception, and compared it to that predicted from a model of the early auditory system. Only auditory RHT could account for the full pattern of the results, suggesting that similar defaults and tradeoffs underlie the relations between hierarchical processing and perception in the visual and auditory modalities.
Collapse
Affiliation(s)
- Mor Nahum
- Interdisciplinary Center for Neural Computation (ICNC), Hebrew University, Jerusalem, Israel
| | - Israel Nelken
- Interdisciplinary Center for Neural Computation (ICNC), Hebrew University, Jerusalem, Israel
- Department of Neurobiology, Hebrew University, Jerusalem, Israel
| | - Merav Ahissar
- Interdisciplinary Center for Neural Computation (ICNC), Hebrew University, Jerusalem, Israel
- Department of Psychology, Hebrew University, Jerusalem, Israel
| |
Collapse
|
46
|
Shinn-Cunningham BG. Object-based auditory and visual attention. Trends Cogn Sci 2008; 12:182-6. [PMID: 18396091 PMCID: PMC2699558 DOI: 10.1016/j.tics.2008.02.003] [Citation(s) in RCA: 427] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2007] [Revised: 02/15/2008] [Accepted: 02/18/2008] [Indexed: 11/18/2022]
Abstract
Theories of visual attention argue that attention operates on perceptual objects, and thus that interactions between object formation and selective attention determine how competing sources interfere with perception. In auditory perception, theories of attention are less mature and no comprehensive framework exists to explain how attention influences perceptual abilities. However, the same principles that govern visual perception can explain many seemingly disparate auditory phenomena. In particular, many recent studies of 'informational masking' can be explained by failures of either auditory object formation or auditory object selection. This similarity suggests that the same neural mechanisms control attention and influence perception across different sensory modalities.
Collapse
Affiliation(s)
- Barbara G Shinn-Cunningham
- Department of Cognitive and Neural Systems, Hearing Research Center, Boston University, Boston, MA 02215, USA.
| |
Collapse
|
47
|
|
48
|
Leibold LJ, Neff DL. Effects of masker-spectral variability and masker fringes in children and adults. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:3666-76. [PMID: 17552718 DOI: 10.1121/1.2723664] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
This study examined the degree to which masker-spectral variability contributes to children's susceptibility to informational masking. Listeners were younger children (5-7 years), older children (8-10 years), and adults (19-34 years). Masked thresholds were measured using a 2IFC, adaptive procedure for a 300-ms, 1000-Hz signal presented simultaneously with (1) broadband noise, (2) a random-frequency ten-tone complex, or (3) a fixed-frequency ten-tone complex. Maskers were presented at an overall level of 60 dB SPL. Thresholds were similar across age for the noise condition. Thresholds for most children were higher than for most adults, however, for both ten-tone conditions. The average difference in threshold between random and fixed ten-tone conditions was comparable across age, suggesting a similar effect of reducing masker-spectral variability in children and adults. Children appear more likely to be susceptible to informational masking than adults, however, both with and in the absence of masker-spectral variability. The addition of a masker fringe (delayed onset of signal relative to masker) provided a release from masking for fixed and random ten-tone conditions in all age groups, suggesting at least part of the masking observed for both ten-tone maskers was informational.
Collapse
Affiliation(s)
- Lori J Leibold
- Boys Town National Research Hospital, Omaha, Nebraska 68131, USA.
| | | |
Collapse
|
49
|
Freyman RL, Helfer KS, Balakrishnan U. Variability and uncertainty in masking by competing speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:1040-6. [PMID: 17348526 DOI: 10.1121/1.2427117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
This study investigated the role of uncertainty in masking of speech by interfering speech. Target stimuli were nonsense sentences recorded by a female talker. Masking sentences were recorded from ten female talkers and combined into pairs. Listeners' recognition performance was measured with both target and masker presented from a front loudspeaker (nonspatial condition) or with a masker presented from two loudspeakers, with the right leading the front by 4 ms (spatial condition). In Experiment 1, the sentences were presented in blocks in which the masking talkers, spatial configuration, and signal-to-noise (S-N) ratio were fixed. Listeners' recognition performance varied widely among the masking talkers in the nonspatial condition, much less so in the spatial condition. This result was attributed to variation in effectiveness of informational masking in the nonspatial condition. The second experiment increased uncertainty by randomizing masking talkers and S-N ratios across trials in some conditions, and reduced uncertainty by presenting the same token of masker across trials in other conditions. These variations in masker uncertainty had relatively small effects on speech recognition.
Collapse
Affiliation(s)
- Richard L Freyman
- Department of Communication Disorders, University of Massachusetts, Amherst, Massachusetts 01003, USA.
| | | | | |
Collapse
|
50
|
Lutfi RA, Jesteadt W. Molecular analysis of the effect of relative tone level on multitone pattern discrimination. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 120:3853-60. [PMID: 17225412 DOI: 10.1121/1.2361184] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Molecular psychophysics attempts to model the observer's response to stimuli as they vary from trial to trial. The approach has gained popularity in multitone pattern discrimination studies as a means of estimating the relative reliance or decision weight listeners give to different tones in the pattern. Various factors affecting decision weights have been examined, but one largely ignored is the relative level of tones in the pattern. In the present study listeners detected a level-increment in a sequence of 5, 100-ms, 2.0-kHz tone bursts alternating in level between 40 and 80 dB SPL. The level increment was made largest on the 40-dB tones, yet despite this all four highly-practiced listeners gave near exclusive weight to the 80-dB tones. The effect was the same when the tones were replaced by bursts of broadband Gaussian noise alternating in level. It was reduced only when the level differences were made <10 dB, and it was entirely reversed only when the low-level tones alternated with louder bursts of Gaussian noise. The results are discussed in terms of the effects of both sensory and perceptual factors on estimates of decision weights.
Collapse
Affiliation(s)
- Robert A Lutfi
- Department of Communicative Disorders and Waisman Center University of Wisconsin, Madison, Wisconsin 53706, USA
| | | |
Collapse
|