1
|
Byrne AJ, Conroy C, Kidd G. Individual differences in speech-on-speech masking are correlated with cognitive and visual task performance. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2137-2153. [PMID: 37800988 PMCID: PMC10631817 DOI: 10.1121/10.0021301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 07/19/2023] [Accepted: 09/17/2023] [Indexed: 10/07/2023]
Abstract
Individual differences in spatial tuning for masked target speech identification were determined using maskers that varied in type and proximity to the target source. The maskers were chosen to produce three strengths of informational masking (IM): high [same-gender, speech-on-speech (SOS) masking], intermediate (the same masker speech time-reversed), and low (speech-shaped, speech-envelope-modulated noise). Typical for this task, individual differences increased as IM increased, while overall performance decreased. To determine the extent to which auditory performance might generalize to another sensory modality, a comparison visual task was also implemented. Visual search time was measured for identifying a cued object among "clouds" of distractors that were varied symmetrically in proximity to the target. The visual maskers also were chosen to produce three strengths of an analog of IM based on feature similarities between the target and maskers. Significant correlations were found for overall auditory and visual task performance, and both of these measures were correlated with an index of general cognitive reasoning. Overall, the findings provide qualified support for the proposition that the ability of an individual to solve IM-dominated tasks depends on cognitive mechanisms that operate in common across sensory modalities.
Collapse
Affiliation(s)
- Andrew J Byrne
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, Boston, Massachusetts 02215, USA
| | - Christopher Conroy
- Department of Biological and Vision Sciences, State University of New York College of Optometry, New York, New York 10036, USA
| | - Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
2
|
Cho AY, Kidd G. Auditory motion as a cue for source segregation and selection in a "cocktail party" listening environment. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1684. [PMID: 36182296 PMCID: PMC9489258 DOI: 10.1121/10.0013990] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Source motion was examined as a cue for segregating concurrent speech or noise sources. In two different headphone-based tasks-motion detection (MD) and speech-on-speech masking (SI)-one source among three was designated as the target only by imposing sinusoidal variation in azimuth during the stimulus presentation. For MD, the lstener was asked which of the three concurrent sources was in motion during the trial. For SI, the listener was asked to report the words spoken by the moving speech source. MD performance improved as the amplitude of the sinusoidal motion (i.e., displacement in azimuth) increased over the range of values tested (±5° to ±30°) for both modulated noise and speech targets, with better performance found for speech. SI performance also improved as the amplitude of target motion increased. Furthermore, SI performance improved as word position progressed throughout the sentence. Performance on the MD task was correlated with performance on SI task across individual subjects. For the SI conditions tested here, these findings are consistent with the proposition that listeners first detect the moving target source, then focus attention on the target location as the target sentence unfolds.
Collapse
Affiliation(s)
- Adrian Y Cho
- Speech and Hearing Bioscience and Technology Program, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Gerald Kidd
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
3
|
Temporal and Directional Cue Effects on the Cocktail Party Problem for Patients With Listening Difficulties Without Clinical Hearing Loss. Ear Hear 2022; 43:1740-1751. [DOI: 10.1097/aud.0000000000001247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
4
|
Liang W, Brown CA, Shinn-Cunningham BG. Cat-astrophic effects of sudden interruptions on spatial auditory attention. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3219. [PMID: 35649920 PMCID: PMC9113758 DOI: 10.1121/10.0010453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Salient interruptions draw attention involuntarily. Here, we explored whether this effect depends on the spatial and temporal relationships between a target stream and interrupter. In a series of online experiments, listeners focused spatial attention on a target stream of spoken syllables in the presence of an otherwise identical distractor stream from the opposite hemifield. On some random trials, an interrupter (a cat "MEOW") occurred. Experiment 1 established that the interrupter, which occurred randomly in 25% of the trials in the hemifield opposite the target, degraded target recall. Moreover, a majority of participants exhibited this degradation for the first target syllable, which finished before the interrupter began. Experiment 2 showed that the effect of an interrupter was similar whether it occurred in the opposite or the same hemifield as the target. Experiment 3 found that the interrupter degraded performance slightly if it occurred before the target stream began but had no effect if it began after the target stream ended. Experiment 4 showed decreased interruption effects when the interruption frequency increased (50% of the trials). These results demonstrate that a salient interrupter disrupts recall of a target stream, regardless of its direction, especially if it occurs during a target stream.
Collapse
Affiliation(s)
- Wusheng Liang
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Christopher A Brown
- Department of Communication Science and Disorders, The University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA
| | | |
Collapse
|
5
|
Abstract
Identification of speech from a "target" talker was measured in a speech-on-speech
masking task with two simultaneous "masker" talkers. The overall level of each talker was
either fixed or randomized throughout each stimulus presentation to investigate the
effectiveness of level as a cue for segregating competing talkers and attending to the
target. Experimental manipulations included varying the level difference between talkers
and imposing three types of target level uncertainty: 1) fixed target level across trials,
2) random target level across trials, or 3) random target levels on a word-by-word basis
within a trial. When the target level was predictable performance was better than
corresponding conditions when the target level was uncertain. Masker confusions were
consistent with a high degree of informational masking (IM). Furthermore, evidence was
found for "tuning" in level and a level "release" from IM. These findings suggest that
conforming to listener expectation about relative level, in addition to cues signaling
talker identity, facilitates segregation of, and maintaining focus of attention on, a
specific talker in multiple-talker communication situations.
Collapse
Affiliation(s)
- Andrew J Byrne
- Department of Speech, Language, & Hearing Sciences, 1846Boston University, MA, USA
| | - Christopher Conroy
- Department of Speech, Language, & Hearing Sciences, 1846Boston University, MA, USA
| | - Gerald Kidd
- Department of Speech, Language, & Hearing Sciences, 1846Boston University, MA, USA.,Department of Otolaryngology, Head-Neck Surgery, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
6
|
Cortical Processing of Binaural Cues as Shown by EEG Responses to Random-Chord Stereograms. J Assoc Res Otolaryngol 2021; 23:75-94. [PMID: 34904205 PMCID: PMC8783002 DOI: 10.1007/s10162-021-00820-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 10/06/2021] [Indexed: 10/26/2022] Open
Abstract
Spatial hearing facilitates the perceptual organization of complex soundscapes into accurate mental representations of sound sources in the environment. Yet, the role of binaural cues in auditory scene analysis (ASA) has received relatively little attention in recent neuroscientific studies employing novel, spectro-temporally complex stimuli. This may be because a stimulation paradigm that provides binaurally derived grouping cues of sufficient spectro-temporal complexity has not yet been established for neuroscientific ASA experiments. Random-chord stereograms (RCS) are a class of auditory stimuli that exploit spectro-temporal variations in the interaural envelope correlation of noise-like sounds with interaurally coherent fine structure; they evoke salient auditory percepts that emerge only under binaural listening. Here, our aim was to assess the usability of the RCS paradigm for indexing binaural processing in the human brain. To this end, we recorded EEG responses to RCS stimuli from 12 normal-hearing subjects. The stimuli consisted of an initial 3-s noise segment with interaurally uncorrelated envelopes, followed by another 3-s segment, where envelope correlation was modulated periodically according to the RCS paradigm. Modulations were applied either across the entire stimulus bandwidth (wideband stimuli) or in temporally shifting frequency bands (ripple stimulus). Event-related potentials and inter-trial phase coherence analyses of the EEG responses showed that the introduction of the 3- or 5-Hz wideband modulations produced a prominent change-onset complex and ongoing synchronized responses to the RCS modulations. In contrast, the ripple stimulus elicited a change-onset response but no response to ongoing RCS modulation. Frequency-domain analyses revealed increased spectral power at the fundamental frequency and the first harmonic of wideband RCS modulations. RCS stimulation yields robust EEG measures of binaurally driven auditory reorganization and has potential to provide a flexible stimulation paradigm suitable for isolating binaural effects in ASA experiments.
Collapse
|
7
|
Mattsson TS, Lind O, Follestad T, Grøndahl K, Wilson W, Nicholas J, Nordgård S, Andersson S. Electrophysiological characteristics in children with listening difficulties, with or without auditory processing disorder. Int J Audiol 2019; 58:704-716. [PMID: 31154863 DOI: 10.1080/14992027.2019.1621396] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Objective: To determine if the auditory middle latency responses (AMLR), auditory late latency response (ALLR) and auditory P300 were sensitive to auditory processing disorder (APD) and listening difficulties in children, and further to elucidate mechanisms regarding level of neurobiological problems in the central auditory nervous system. Design: Three-group, repeated measure design. Study sample: Forty-six children aged 8-14 years were divided into three groups: children with reported listening difficulties fulfilling APD diagnostic criteria, children with reported listening difficulties not fulfilling APD diagnostic criteria and normally hearing children. Results: AMLR Na latency and P300 latency and amplitude were sensitive to listening difficulties. No other auditory evoked potential (AEP) measures were sensitive to listening difficulties, and no AEP measures were sensitive to APD only. Moderate correlations were observed between P300 latency and amplitude and the behavioural AP measures of competing words, frequency patterns, duration patterns and dichotic digits. Conclusions: Impaired thalamo-cortical (bottom up) and neurocognitive function (top-down) may contribute to difficulties discriminating speech and non-speech sounds. Cognitive processes involved in conscious recognition, attention and discrimination of the acoustic characteristics of the stimuli could contribute to listening difficulties in general, and to APD in particular.
Collapse
Affiliation(s)
- Tone Stokkereit Mattsson
- Department of Otorhinolaryngology, Head and Neck Surgery, Ålesund Hospital , Aalesund , Norway.,Department of Neuromedicine and Movement Science, Norwegian University of Science and Technology , Trondheim , Norway
| | - Ola Lind
- Department of Otorhinolaryngology, Head and Neck Surgery, Haukeland University Hospital , Bergen , Norway
| | - Turid Follestad
- Department of Public Health and General Practice, Norwegian University of Science and Technology , Trondheim , Norway
| | - Kjell Grøndahl
- Department of Clinical Engineering, Haukeland University Hospital , Bergen , Norway
| | - Wayne Wilson
- School of Health and Rehabilitation Sciences, The University of Queensland , Brisbane , Australia
| | - Jude Nicholas
- Statped National Service Center for Special Needs Education , Bergen , Norway.,Department of Occupational Medicine, Haukeland University Hospital , Bergen , Norway
| | - Ståle Nordgård
- Department of Neuromedicine and Movement Science, Norwegian University of Science and Technology , Trondheim , Norway.,Department of Otorhinolaryngology, Head and Neck Surgery, St. Olavs University Hospital , Trondheim , Norway
| | - Stein Andersson
- Department of Psychology, University of Oslo , Oslo , Norway
| |
Collapse
|
8
|
Kidd G. Enhancing Auditory Selective Attention Using a Visually Guided Hearing Aid. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:3027-3038. [PMID: 29049603 PMCID: PMC5945072 DOI: 10.1044/2017_jslhr-h-17-0071] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 07/28/2017] [Accepted: 07/31/2017] [Indexed: 05/27/2023]
Abstract
PURPOSE Listeners with hearing loss, as well as many listeners with clinically normal hearing, often experience great difficulty segregating talkers in a multiple-talker sound field and selectively attending to the desired "target" talker while ignoring the speech from unwanted "masker" talkers and other sources of sound. This listening situation forms the classic "cocktail party problem" described by Cherry (1953) that has received a great deal of study over the past few decades. In this article, a new approach to improving sound source segregation and enhancing auditory selective attention is described. The conceptual design, current implementation, and results obtained to date are reviewed and discussed in this article. METHOD This approach, embodied in a prototype "visually guided hearing aid" (VGHA) currently used for research, employs acoustic beamforming steered by eye gaze as a means for improving the ability of listeners to segregate and attend to one sound source in the presence of competing sound sources. RESULTS The results from several studies demonstrate that listeners with normal hearing are able to use an attention-based "spatial filter" operating primarily on binaural cues to selectively attend to one source among competing spatially distributed sources. Furthermore, listeners with sensorineural hearing loss generally are less able to use this spatial filter as effectively as are listeners with normal hearing especially in conditions high in "informational masking." The VGHA enhances auditory spatial attention for speech-on-speech masking and improves signal-to-noise ratio for conditions high in "energetic masking." Visual steering of the beamformer supports the coordinated actions of vision and audition in selective attention and facilitates following sound source transitions in complex listening situations. CONCLUSIONS Both listeners with normal hearing and with sensorineural hearing loss may benefit from the acoustic beamforming implemented by the VGHA, especially for nearby sources in less reverberant sound fields. Moreover, guiding the beam using eye gaze can be an effective means of sound source enhancement for listening conditions where the target source changes frequently over time as often occurs during turn-taking in a conversation. PRESENTATION VIDEO http://cred.pubs.asha.org/article.aspx?articleid=2601621.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language, and Hearing Sciences and Hearing Research Center, Boston University, MA
| |
Collapse
|
9
|
Kidd G, Colburn HS. Informational Masking in Speech Recognition. SPRINGER HANDBOOK OF AUDITORY RESEARCH 2017. [DOI: 10.1007/978-3-319-51662-2_4] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
10
|
Shinn-Cunningham B, Best V, Lee AKC. Auditory Object Formation and Selection. SPRINGER HANDBOOK OF AUDITORY RESEARCH 2017. [DOI: 10.1007/978-3-319-51662-2_2] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
11
|
Kidd G, Mason CR, Swaminathan J, Roverud E, Clayton KK, Best V. Determining the energetic and informational components of speech-on-speech masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:132. [PMID: 27475139 PMCID: PMC5392100 DOI: 10.1121/1.4954748] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Identification of target speech was studied under masked conditions consisting of two or four independent speech maskers. In the reference conditions, the maskers were colocated with the target, the masker talkers were the same sex as the target, and the masker speech was intelligible. The comparison conditions, intended to provide release from masking, included different-sex target and masker talkers, time-reversal of the masker speech, and spatial separation of the maskers from the target. Significant release from masking was found for all comparison conditions. To determine whether these reductions in masking could be attributed to differences in energetic masking, ideal time-frequency segregation (ITFS) processing was applied so that the time-frequency units where the masker energy dominated the target energy were removed. The remaining target-dominated "glimpses" were reassembled as the stimulus. Speech reception thresholds measured using these resynthesized ITFS-processed stimuli were the same for the reference and comparison conditions supporting the conclusion that the amount of energetic masking across conditions was the same. These results indicated that the large release from masking found under all comparison conditions was due primarily to a reduction in informational masking. Furthermore, the large individual differences observed generally were correlated across the three masking release conditions.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jayaganesh Swaminathan
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Elin Roverud
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Kameron K Clayton
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
12
|
Kidd G, Mason CR, Best V, Swaminathan J. Benefits of Acoustic Beamforming for Solving the Cocktail Party Problem. Trends Hear 2015; 19:2331216515593385. [PMID: 26126896 PMCID: PMC4509760 DOI: 10.1177/2331216515593385] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The benefit provided to listeners with sensorineural hearing loss (SNHL) by an acoustic beamforming microphone array was determined in a speech-on-speech masking experiment. Normal-hearing controls were tested as well. For the SNHL listeners, prescription-determined gain was applied to the stimuli, and performance using the beamformer was compared with that obtained using bilateral amplification. The listener identified speech from a target talker located straight ahead (0° azimuth) in the presence of four competing talkers that were either colocated with, or spatially separated from, the target. The stimuli were spatialized using measured impulse responses and presented via earphones. In the spatially separated masker conditions, the four maskers were arranged symmetrically around the target at ±15° and ±30° or at ±45° and ±90°. Results revealed that masked speech reception thresholds for spatially separated maskers were higher (poorer) on average for the SNHL than for the normal-hearing listeners. For most SNHL listeners in the wider masker separation condition, lower thresholds were obtained through the microphone array than through bilateral amplification. Large intersubject differences were found in both listener groups. The best masked speech reception thresholds overall were found for a hybrid condition that combined natural and beamforming listening in order to preserve localization for broadband sources.
Collapse
|
13
|
Allen K, Alais D, Shinn-Cunningham B, Carlile S. Masker location uncertainty reveals evidence for suppression of maskers in two-talker contexts. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:2043-2053. [PMID: 21973359 PMCID: PMC3206908 DOI: 10.1121/1.3631666] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Revised: 08/05/2011] [Accepted: 08/08/2011] [Indexed: 05/31/2023]
Abstract
In many natural settings, spatial release from masking aids speech intelligibility, especially when there are competing talkers. This paper describes a series of three experiments that investigate the role of prior knowledge of masker location on phoneme identification and spatial release from masking. In contrast to previous work, these experiments use initial stop-consonant identification as a test of target intelligibility to ensure that listeners had little time to switch the focus of spatial attention during the task. The first experiment shows that target phoneme identification was worse when a masker played from an unexpected location (increasing the consonant identification threshold by 2.6 dB) compared to when an energetically very similar and symmetrically located masker came from an expected location. In the second and third experiments, target phoneme identification was worse (increasing target threshold levels by 2.0 and 2.6 dB, respectively) when the target was played unexpectedly on the side from which the masker was expected compared to when the target came from an unexpected, symmetrical location in the hemifield opposite the expected location of the masker. These results support the idea that listeners modulate spatial attention by both focusing resources on the expected target location and withdrawing attentional resources from expected locations of interfering sources.
Collapse
Affiliation(s)
- Kachina Allen
- School of Medical Sciences, University of Sydney, New South Wales, Australia 2106.
| | | | | | | |
Collapse
|
14
|
Kidd G, Mason CR, Best V, Marrone N. Stimulus factors influencing spatial release from speech-on-speech masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:1965-78. [PMID: 20968368 PMCID: PMC2981113 DOI: 10.1121/1.3478781] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
This study examined spatial release from masking (SRM) when a target talker was masked by competing talkers or by other types of sounds. The focus was on the role of interaural time differences (ITDs) and time-varying interaural level differences (ILDs) under conditions varying in the strength of informational masking (IM). In the first experiment, a target talker was masked by two other talkers that were either colocated with the target or were symmetrically spatially separated from the target with the stimuli presented through loudspeakers. The sounds were filtered into different frequency regions to restrict the available interaural cues. The largest SRM occurred for the broadband condition followed by a low-pass condition. However, even the highest frequency bandpass-filtered condition (3-6 kHz) yielded a significant SRM. In the second experiment the stimuli were presented via earphones. The listeners identified the speech of a target talker masked by one or two other talkers or noises when the maskers were colocated with the target or were perceptually separated by ITDs. The results revealed a complex pattern of masking in which the factors affecting performance in colocated and spatially separated conditions are to a large degree independent.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences, and Hearing Research Center, Boston University, Boston, Massachusetts 02215, USA
| | | | | | | |
Collapse
|
15
|
Kitterick PT, Bailey PJ, Summerfield AQ. Benefits of knowing who, where, and when in multi-talker listening. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:2498-2508. [PMID: 20370032 DOI: 10.1121/1.3327507] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The benefits of prior information about who would speak, where they would be located, and when they would speak were measured in a multi-talker spatial-listening task. On each trial, a target phrase and several masker phrases were allocated to 13 loudspeakers in a 180 degrees arc, and to 13 overlapping time slots, which started every 800 ms. Speech-reception thresholds (SRTs) were measured as the level of target relative to masker phrases at which listeners reported key words at 71% correct. When phases started in pairs all three cues were beneficial ("who" 3.2 dB, "where" 5.1 dB, and "when" 0.3 dB). Over a range of onset asynchronies, SRTs corresponded consistently to a signal-to-noise ratio (SNR) of -2 dB at the start of the target phrase. When phrases started one at a time, SRTs fell to a SNR of -8 dB and were improved significantly, but only marginally, by constraining "who" (1.9 dB), and not by constraining "where" (1.0 dB) or "when" (0.01 dB). Thus, prior information about "who," "where," and "when" was beneficial, but only when talkers started speaking in pairs. Low SRTs may arise when talkers start speaking one at a time because of automatic orienting to phrase onsets and/or the use of loudness differences to distinguish target from masker phrases.
Collapse
Affiliation(s)
- Pádraig T Kitterick
- Department of Psychology, University of York, York YO10 5DD, United Kingdom.
| | | | | |
Collapse
|
16
|
Werner LA, Parrish HK, Holmer NM. Effects of temporal uncertainty and temporal expectancy on infants' auditory sensitivity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:1040-1049. [PMID: 19206878 PMCID: PMC2677369 DOI: 10.1121/1.3050254] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Revised: 10/08/2008] [Accepted: 11/13/2008] [Indexed: 05/27/2023]
Abstract
Adults are more sensitive to a sound if they know when the sound will occur. In the present experiment, the effects of temporal uncertainty and temporal expectancy on infants' and adults' detection of a 1 kHz tone in a broadband noise were examined. In one experiment, masked sensitivity was measured with an acoustic cue and without an acoustic cue to possible tone presentation times. Adults' sensitivity was greater for the cue than for the no-cue condition, while infants' sensitivity did not differ significantly between the cue and no-cue conditions. In a second experiment, the effect of temporal expectancy was investigated. The detection advantage for sounds occurring at an expected (most frequent) time, over sounds occurring at unexpected (less frequent) times, was examined. Both infants and adults detected a tone better when it occurred before or at an expected time following a cue than when it occurred at a later time. Thus, despite the fact that the auditory cue did not improve infants' sensitivity, it nonetheless provided the basis for temporal expectancies. Infants, like adults, are more sensitive to sounds that are consistent with temporal expectancy.
Collapse
Affiliation(s)
- Lynne A Werner
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington 98105-6246, USA.
| | | | | |
Collapse
|
17
|
Jones GL, Litovsky RY. Role of masker predictability in the cocktail party problem. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:3818-3830. [PMID: 19206808 PMCID: PMC2676623 DOI: 10.1121/1.2996336] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2007] [Revised: 08/29/2008] [Accepted: 09/06/2008] [Indexed: 05/27/2023]
Abstract
In studies of the cocktail party problem, the number and locations of maskers are typically fixed throughout a block of trials, which leaves out uncertainty that exists in real-world environments. The current experiments examined whether there is (1) improved speech intelligibility and (2) increased spatial release from masking (SRM), as predictability of the number/locations of speech maskers is increased. In the first experiment, subjects identified a target word presented at a fixed level in the presence of 0, 1, or 2 maskers as predictability of the masker configuration ranged from 10% to 80%. The second experiment examined speech reception thresholds and SRM as (a) predictability of the masker configuration is increased from 20% to 80% and/or (b) the complexity of the listening environment is decreased. In the third experiment, predictability of the masker configuration was increased from 20% up to 100% while minimizing the onset delay between maskers and the target. All experiments showed no effect of predictability of the masker configuration on speech intelligibility or SRM. These results suggest that knowing the number and location(s) of maskers may not necessarily contribute significantly to solving the cocktail party problem, at least not when the location of the target is known.
Collapse
Affiliation(s)
- Gary L Jones
- Department of Physiology, University of Wisconsin-Madison, Madison, Wisconsin 53705, USA
| | | |
Collapse
|
18
|
Ihlefeld A, Shinn-Cunningham B. Disentangling the effects of spatial cues on selection and formation of auditory objects. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:2224-2235. [PMID: 19062861 PMCID: PMC9014243 DOI: 10.1121/1.2973185] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2006] [Revised: 07/08/2008] [Accepted: 07/16/2008] [Indexed: 05/27/2023]
Abstract
When competing sources come from different directions, a desired target is easier to hear than when the sources are co-located. How much of this improvement is the result of spatial attention rather than improved perceptual segregation of the competing sources is not well understood. Here, listeners' attention was directed to spatial or nonspatial cues when they listened for a target masked by a competing message. A preceding cue signaled the target timbre, location, or both timbre and location. Spatial separation improved performance when the cue indicated the target location, or both the location and timbre, but not when the cue only indicated the target timbre. However, response errors were influenced by spatial configuration in all conditions. Both attention and streaming contributed to spatial effects when listeners actively attended to location. In contrast, when attention was directed to a nonspatial cue, spatial separation primarily appeared to improve the streaming of auditory objects across time. Thus, when attention is focused on location, spatial separation appears to improve both object selection and object formation; when attention is directed to nonspatial cues, separation affects object formation. These results highlight the need to distinguish between these separate mechanisms when considering how observers cope with complex auditory scenes.
Collapse
Affiliation(s)
- Antje Ihlefeld
- Auditory Neuroscience Laboratory, Boston University Hearing Research Center, 677 Beacon Street, Boston, Massachusetts 02215, USA
| | | |
Collapse
|
19
|
Abstract
In complex scenes, the identity of an auditory object can build up across seconds. Given that attention operates on perceptual objects, this perceptual buildup may alter the efficacy of selective auditory attention over time. Here, we measured identification of a sequence of spoken target digits presented with distracter digits from other directions to investigate the dynamics of selective attention. Performance was better when the target location was fixed rather than changing between digits, even when listeners were cued as much as 1 s in advance about the position of each subsequent digit. Spatial continuity not only avoided well known costs associated with switching the focus of spatial attention, but also produced refinements in the spatial selectivity of attention across time. Continuity of target voice further enhanced this buildup of selective attention. Results suggest that when attention is sustained on one auditory object within a complex scene, attentional selectivity improves over time. Similar effects may come into play when attention is sustained on an object in a complex visual scene, especially in cases where visual object formation requires sustained attention.
Collapse
|
20
|
Singh G, Pichora-Fuller MK, Schneider BA. The effect of age on auditory spatial attention in conditions of real and simulated spatial separation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:1294-1305. [PMID: 18681615 DOI: 10.1121/1.2949399] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The contributions of auditory and cognitive factors to age-dependent differences in auditory spatial attention were investigated. In conditions of real spatial separation, the target sentence was presented from a central location and competing sentences were presented from left and right locations. In conditions of simulated spatial separation, different apparent spatial locations of the target and competitors were induced using the precedence effect. The identity of the target was cued by a callsign presented either prior to or following each target sentence, and the probability that the target would be presented at the three locations was specified at the beginning of each block. Younger and older adults with normal hearing sensitivity below 4 kHz completed all 16 conditions (2-spatial separation method X 2-callsign conditions X 4-probability conditions). Overall, younger adults performed better than older adults. For both age groups, performance improved with target location certainty, with a priori target cueing, and when location differences were real rather than simulated. For both age groups, the contributions of natural spatial cues were most pronounced when the target occurred at "unlikely" spatial listening locations. This suggests that both age groups benefit similarly from richer acoustical cues and a priori information in difficult listening environments.
Collapse
Affiliation(s)
- Gurjit Singh
- Department of Psychology, University of Toronto, 3359 Mississauga Road North, Mississauga, Ontario, Canada
| | | | | |
Collapse
|
21
|
Marrone N, Mason CR, Kidd G. Tuning in the spatial dimension: evidence from a masked speech identification task. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:1146-58. [PMID: 18681603 PMCID: PMC2809679 DOI: 10.1121/1.2945710] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Revised: 05/23/2008] [Accepted: 05/28/2008] [Indexed: 05/23/2023]
Abstract
Spatial release from masking was studied in a three-talker soundfield listening experiment. The target talker was presented at 0 degrees azimuth and the maskers were either colocated or symmetrically positioned around the target, with a different masker talker on each side. The symmetric placement greatly reduced any "better ear" listening advantage. When the maskers were separated from the target by +/-15 degrees , the average spatial release from masking was 8 dB. Wider separations increased the release to more than 12 dB. This large effect was eliminated when binaural cues and perceived spatial separation were degraded by covering one ear with an earplug and earmuff. Increasing reverberation in the room increased the target-to-masker ratio (TM) for the separated, but not colocated, conditions reducing the release from masking, although a significant advantage of spatial separation remained. Time reversing the masker speech improved performance in both the colocated and spatially separated cases but lowered TM the most for the colocated condition, also resulting in a reduction in the spatial release from masking. Overall, the spatial tuning observed appears to depend on the presence of interaural differences that improve the perceptual segregation of sources and facilitate the focus of attention at a point in space.
Collapse
Affiliation(s)
- Nicole Marrone
- Department of Speech, Language, and Hearing Sciences and the Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA.
| | | | | |
Collapse
|
22
|
Ihlefeld A, Shinn-Cunningham B. Spatial release from energetic and informational masking in a divided speech identification task. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:4380-4392. [PMID: 18537389 PMCID: PMC9014250 DOI: 10.1121/1.2904825] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2007] [Revised: 03/12/2008] [Accepted: 03/13/2008] [Indexed: 05/26/2023]
Abstract
When listening selectively to one talker in a two-talker environment, performance generally improves with spatial separation of the sources. The current study explores the role of spatial separation in divided listening, when listeners reported both of two simultaneous messages processed to have little spectral overlap (limiting "energetic masking" between the messages). One message was presented at a fixed level, while the other message level varied from equal to 40 dB less than that of the fixed-level message. Results demonstrate that spatial separation of the competing messages improved divided-listening performance. Most errors occurred because listeners failed to report the content of the less-intense talker. Moreover, performance generally improved as the broadband energy ratio of the variable-level to the fixed-level talker increased. The error patterns suggest that spatial separation improves the intelligibility of the less-intense talker by improving the ability to (1) hear portions of the signal that would otherwise be masked, (2) segregate the two talkers properly into separate perceptual streams, and (3) selectively focus attention on the less-intense talker. Spatial configuration did not noticeably affect the ability to report the more-intense talker, suggesting that it was processed differently than the less-intense talker, which was actively attended.
Collapse
Affiliation(s)
- Antje Ihlefeld
- Auditory Neuroscience Laboratory, Boston University Hearing Research Center, 677 Beacon St., Boston, Massachusetts 02215, USA
| | | |
Collapse
|
23
|
Allen K, Carlile S, Alais D. Contributions of talker characteristics and spatial location to auditory streaming. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:1562-1570. [PMID: 18345844 DOI: 10.1121/1.2831774] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
To examine whether auditory streaming contributes to unmasking, intelligibility of target sentences against two competing talkers was measured using the coordinate response measure (CRM) [Bolia et al., J. Acoust. Soc. Am. 107, 1065-1066 (2007)] corpus. In the control condition, the speech reception threshold (50% correct) was measured when the target and two maskers were collocated straight ahead. Separating maskers from the target by +/-30 degrees resulted in spatial release from masking of 12 dB. CRM sentences involve an identifier in the first part and two target words in the second part. In experimental conditions, masking talkers started spatially separated at +/-30 degrees but became collocated with the target before the scoring words. In one experiment, one target and two different maskers were randomly selected from a mixed-sex corpus. Significant unmasking of 4 dB remained despite the absence of persistent location cues. When same-sex talkers were used as maskers and target, unmasking was reduced. These data suggest that initial separation may permit confident identification and streaming of the target and masker speech where significant differences between target and masker voice characteristics exist, but where target and masker characteristics are similar, listeners must rely more heavily on continuing spatial cues.
Collapse
Affiliation(s)
- Kachina Allen
- Department of Physiology, University of Sydney, Sydney, NSW 2106, Australia.
| | | | | |
Collapse
|
24
|
Best V, Ozmeral EJ, Shinn-Cunningham BG. Visually-guided attention enhances target identification in a complex auditory scene. J Assoc Res Otolaryngol 2007; 8:294-304. [PMID: 17453308 PMCID: PMC2538357 DOI: 10.1007/s10162-007-0073-z] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2006] [Accepted: 01/17/2007] [Indexed: 10/23/2022] Open
Abstract
In auditory scenes containing many similar sound sources, sorting of acoustic information into streams becomes difficult, which can lead to disruptions in the identification of behaviorally relevant targets. This study investigated the benefit of providing simple visual cues for when and/or where a target would occur in a complex acoustic mixture. Importantly, the visual cues provided no information about the target content. In separate experiments, human subjects either identified learned birdsongs in the presence of a chorus of unlearned songs or recalled strings of spoken digits in the presence of speech maskers. A visual cue indicating which loudspeaker (from an array of five) would contain the target improved accuracy for both kinds of stimuli. A cue indicating which time segment (out of a possible five) would contain the target also improved accuracy, but much more for birdsong than for speech. These results suggest that in real world situations, information about where a target of interest is located can enhance its identification, while information about when to listen can also be helpful when targets are unfamiliar or extremely similar to their competitors.
Collapse
Affiliation(s)
- Virginia Best
- Hearing Research Center, Boston University, Boston, MA 02215, USA.
| | | | | |
Collapse
|
25
|
Pichora-Fuller MK, Singh G. Effects of age on auditory and cognitive processing: implications for hearing aid fitting and audiologic rehabilitation. Trends Amplif 2006; 10:29-59. [PMID: 16528429 PMCID: PMC4111543 DOI: 10.1177/108471380601000103] [Citation(s) in RCA: 274] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Recent advances in research and clinical practice concerning aging and auditory communication have been driven by questions about age-related differences in peripheral hearing, central auditory processing, and cognitive processing. A "site-of-lesion'' view based on anatomic levels inspired research to test competing hypotheses about the contributions of changes at these three levels of the nervous system. A "processing'' view based on psychologic functions inspired research to test alternative hypotheses about how lower-level sensory processes and higher-level cognitive processes interact. In the present paper, we suggest that these two views can begin to be unified following the example set by the cognitive neuroscience of aging. The early pioneers of audiology anticipated such a unified view, but today, advances in science and technology make it both possible and necessary. Specifically, we argue that a synthesis of new knowledge concerning the functional neuroscience of auditory cognition is necessary to inform the design and fitting of digital signal processing in "intelligent'' hearing devices, as well as to inform best practices for resituating hearing aid fitting in a broader context of audiologic rehabilitation. Long-standing approaches to rehabilitative audiology should be revitalized to emphasize the important role that training and therapy play in promoting compensatory brain reorganization as older adults acclimatize to new technologies. The purpose of the present paper is to provide an integrated framework for understanding how auditory and cognitive processing interact when older adults listen, comprehend, and communicate in realistic situations, to review relevant models and findings, and to suggest how new knowledge about age-related changes in audition and cognition may influence future developments in hearing aid fitting and audiologic rehabilitation.
Collapse
Affiliation(s)
- M Kathleen Pichora-Fuller
- Department of Psychology, University of Toronto, 3359 Mississauga Road, Mississauga, Ontario, Canada L5L 1C6.
| | | |
Collapse
|
26
|
Kidd G, Arbogast TL, Mason CR, Gallun FJ. The advantage of knowing where to listen. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 118:3804-15. [PMID: 16419825 DOI: 10.1121/1.2109187] [Citation(s) in RCA: 173] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
This study examined the role of focused attention along the spatial (azimuthal) dimension in a highly uncertain multitalker listening situation. The task of the listener was to identify key words from a target talker in the presence of two other talkers simultaneously uttering similar sentences. When the listener had no a priori knowledge about target location, or which of the three sentences was the target sentence, performance was relatively poor-near the value expected simply from choosing to focus attention on only one of the three locations. When the target sentence was cued before the trial, but location was uncertain, performance improved significantly relative to the uncued case. When spatial location information was provided before the trial, performance improved significantly for both cued and uncued conditions. If the location of the target was certain, proportion correct identification performance was higher than 0.9 independent of whether the target was cued beforehand. In contrast to studies in which known versus unknown spatial locations were compared for relatively simple stimuli and tasks, the results of the current experiments suggest that the focus of attention along the spatial dimension can play a very significant role in solving the "cocktail party" problem.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA.
| | | | | | | |
Collapse
|
27
|
Hammond GR, Seth Y, Ison JR. Concurrent measurement of the detectability of tone bursts and their effect on the excitability of the human blink reflex using a probe-signal method. Hear Res 2005; 202:28-34. [PMID: 15811696 DOI: 10.1016/j.heares.2004.07.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2004] [Accepted: 07/23/2004] [Indexed: 11/20/2022]
Abstract
The probe-signal method has shown that auditory signals that are either presented more often in a series of trials or that are immediately preceded by cues of the same frequency on a single trial are detected more readily than signals of other frequencies. The frequency range in which detection is favored defines an attentional band, which is thought to result from an effective attenuation of deviant frequencies in the cochlea, possibly by activation of the olivocochlear bundle. In a 2IFC procedure in which the first observation interval was preceded by a 1300-Hz cue, subjects detected cued probe tones (at 1300 Hz) but not uncued probe tones (at 1000 Hz or 1600 Hz) at better than chance levels. Concurrent elicitation of a blink reflex by presentation of an air puff in the first observation interval on a random half of the trials showed that cued probes, but not uncued probes, inhibited the size of the blink reflex. These data show that uncued probes do not enter into the low-level sensory processing in the brainstem which is responsible for reflex modification. This finding is consistent with the view that stimuli whose frequency falls outside an attentional band are excluded at the auditory periphery.
Collapse
Affiliation(s)
- Geoffrey R Hammond
- School of Psychology, The University of Western Australia, Crawley, Australia.
| | | | | |
Collapse
|
28
|
Abstract
Abstract. Everyday experience tells us that some types of auditory sensory information are retained for long periods of time. For example, we are able to recognize friends by their voice alone or identify the source of familiar noises even years after we last heard the sounds. It is thus somewhat surprising that the results of most studies of auditory sensory memory show that acoustic details, such as the pitch of a tone, fade from memory in ca. 10-15 s. One should, therefore, ask (1) what types of acoustic information can be retained for a longer term, (2) what circumstances allow or help the formation of durable memory records for acoustic details, and (3) how such memory records can be accessed. The present review discusses the results of experiments that used a model of auditory recognition, the auditory memory reactivation paradigm. Results obtained with this paradigm suggest that the brain stores features of individual sounds embedded within representations of acoustic regularities that have been detected for the sound patterns and sequences in which the sounds appeared. Thus, sounds closely linked with their auditory context are more likely to be remembered. The representations of acoustic regularities are automatically activated by matching sounds, enabling object recognition.
Collapse
Affiliation(s)
- István Winkler
- Institute for Psychology, Hungarian Academy of Sciences, Hungary.
| | | |
Collapse
|
29
|
Freyman RL, Balakrishnan U, Helfer KS. Effect of number of masking talkers and auditory priming on informational masking in speech recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2004; 115:2246-56. [PMID: 15139635 DOI: 10.1121/1.1689343] [Citation(s) in RCA: 224] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Three experiments investigated factors that influence the creation of and release from informational masking in speech recognition. The target stimuli were nonsense sentences spoken by a female talker. In experiment 1 the masker was a mixture of three, four, six, or ten female talkers, all reciting similar nonsense sentences. Listeners' recognition performance was measured with both target and masker presented from a front loudspeaker (F-F) or with a masker presented from two loudspeakers, with the right leading the front by 4 ms (F-RF). In the latter condition the target and masker appear to be from different locations. This aids recognition performance for one- and two-talker maskers, but not for noise. As the number of masking talkers increased to ten, the improvement in the F-RF condition diminished, but did not disappear. The second experiment investigated whether hearing a preview (prime) of the target sentence before it was presented in masking improved recognition for the last key word, which was not included in the prime. Marked improvements occurred only for the F-F condition with two-talker masking, not for continuous noise or F-RF two-talker masking. The third experiment found that the benefit of priming in the F-F condition was maintained if the prime sentence was spoken by a different talker or even if it was printed and read silently. These results suggest that informational masking can be overcome by factors that improve listeners' auditory attention toward the target.
Collapse
Affiliation(s)
- Richard L Freyman
- Department of Communication Disorders, University of Massachusetts, Amherst, 715 N. Pleasant Street, Room 6 Arnold House, Amherst, Massachusetts 01003, USA.
| | | | | |
Collapse
|
30
|
Kidd G, Mason CR, Arbogast TL, Brungart DS, Simpson BD. Informational masking caused by contralateral stimulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2003; 113:1594-1603. [PMID: 12656394 DOI: 10.1121/1.1547440] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Although informational masking is thought to reflect central mechanisms, the effects are generally much stronger when the target and masker are presented to the same ear than when they are presented to different ears. However, the results of a recent study by Brungart and Simpson [J. Acoust. Soc. Am. 112, 2985-2995 (2002)] indicated that a speech masker that is presented contralateral to a speech signal can produce substantial amounts of informational masking when a second speech masker is played simultaneously in the same ear as the signal. In this study, we conducted a series of experiments that paralleled those of Brungart and Simpson but used a pure-tone signal and multitone informational maskers in a detection task. Both the signal and the maskers were played as sequences of short bursts in each observation interval. The maskers were arranged in two types of spectrotemporal patterns. One type of pattern, called "multiple-bursts same" (MBS), has previously been shown to produce very large amounts of informational masking while the other type of pattern, called "multiple-bursts different" (MBD), has been shown to produce very small amounts of informational masking. Several conditions of ipsilateral, contralateral, and combined presentation of these maskers were tested. The results showed that presentation of the MBS masker in the contralateral ear produced a substantial amount of informational masking when the MBD masker was simultaneously presented to the ipsilateral ear. The results supported the earlier findings of Brungart and Simpson indicating that listeners are unable to selectively focus their attention on a single ear in some complex dichotic listening conditions. These results suggest that this contralateral masking effect is not restricted to speech and may reflect more general limitations on processing capacity. Further, it was concluded that the magnitude of the contralateral masking effect was related both to the informational masking value of the contralateral masker and the complexity of the stimulus and/or task in the ear in which the signal was presented.
Collapse
Affiliation(s)
- Gerald Kidd
- Hearing Research Center, Sargent College, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02132, USA.
| | | | | | | | | |
Collapse
|
31
|
Saberi K, Tirtabudi P, Petrosyan A, Perrott DR, Strybel TZ. Concurrent motion detection based on dynamic changes in interaural delay. Hear Res 2002; 174:149-57. [PMID: 12433406 DOI: 10.1016/s0378-5955(02)00652-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The ability to detect a dynamic change in the interaural delay of a pure tone in the presence of a distracter tone of a different frequency was investigated in four conditions: (1) a control condition in which no distracter tone was present, (2) the distracter tone was stationary (fixed interaural delay), (3) the distracter had an interaural delay that changed in the same direction as that of the target tone, i.e., concurrent auditory motion in the same direction, and (4) the distracter had an interaural delay that changed in a direction opposite to that of the target tone, i.e., concurrent auditory motion in opposite directions. In a cued single-interval two-alternative forced-choice design, the observer had to determine if the target tone had a constant or dynamic interaural delay. The target was a 500-Hz tone and the distracter was a tone with a frequency of 300, 510, 550, 600, 800, or 1000 Hz. Detection was also examined for a range of stimulus durations, rates of change in interaural delay (i.e., velocity), and extent of change in interaural time difference (i.e., 'distance'). Results showed that the best performance (highest d') was associated with the no-distracter condition, followed by the stationary-distracter, opposite-direction, and same-direction conditions, respectively. Detection improved with increasing frequency difference between distracter and target tones, but was nonetheless lower than that associated with the no-distracter condition, even when the distracter frequency was several critical bands removed from the target frequency.
Collapse
Affiliation(s)
- Kourosh Saberi
- Department of Cognitive Sciences, University of California, Irvine, CA 92697, USA.
| | | | | | | | | |
Collapse
|
32
|
Kidd G, Mason CR, Arbogast TL. Similarity, uncertainty, and masking in the identification of nonspeech auditory patterns. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2002; 111:1367-1376. [PMID: 11931314 DOI: 10.1121/1.1448342] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
This study examined whether increasing the similarity between informational maskers and signals would increase the amount of masking obtained in a nonspeech pattern identification task. The signals were contiguous sequences of pure-tone bursts arranged in six narrow-band spectro-temporal patterns. The informational maskers were sequences of multitone bursts played synchronously with the signal tones. The listener's task was to identify the patterns in a 1-interval 6-alternative forced-choice procedure. Three types of multitone maskers were generated according to different randomization rules. For the least signal-like informational masker, the components in each multitone burst were chosen at random within the frequency range of 200-6500 Hz, excluding a "protected region" around the signal frequencies. For the intermediate masker, the frequency components in the first burst were chosen quasirandomly, but the components in successive bursts were constrained to fall in narrow frequency bands around the frequencies of the components in the initial burst. Within the narrow bands the frequencies were randomized. This masker was considered to be more similar to the signal patterns because it consisted of a set of narrow-band sequences any one of which might be mistaken for a signal pattern. The most signal-like masker was similar to the intermediate masker in that it consisted of a set of synchronously played narrow-band sequences, but the variation in frequency within each sequence was sinusoidal, completing roughly one period in a sequence. This masker consisted of discernible patterns but not patterns that were part of the set of signals. In addition, masking produced by Gaussian noise bursts--thought to produce primarily peripherally based "energetic masking"--was measured and compared to the informational masking results. For the three informational maskers, more masking was produced by the maskers comprised of narrow-band sequences than for the masker in which the frequencies were not constrained to narrow bands. Also, the slopes of the performance-level functions for the three informational maskers were much shallower than for the Gaussian noise masker or for no masker. The findings provided qualified support for the hypothesis that increasing the similarity between signals and maskers, or parts of the maskers, causes greater informational masking. However, it is also possible that the greater masking was a consequence of increasing the number of perceptual "streams" that had to be evaluated by the listener.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Communication Disorders and Hearing Research Center, Boston University, Massachusetts 02215, USA
| | | | | |
Collapse
|
33
|
|