1
|
Holmes E, Zeidman P, Friston KJ, Griffiths TD. Difficulties with Speech-in-Noise Perception Related to Fundamental Grouping Processes in Auditory Cortex. Cereb Cortex 2020; 31:1582-1596. [PMID: 33136138 PMCID: PMC7869094 DOI: 10.1093/cercor/bhaa311] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 08/04/2020] [Accepted: 09/22/2020] [Indexed: 01/05/2023] Open
Abstract
In our everyday lives, we are often required to follow a conversation when background noise is present (“speech-in-noise” [SPIN] perception). SPIN perception varies widely—and people who are worse at SPIN perception are also worse at fundamental auditory grouping, as assessed by figure-ground tasks. Here, we examined the cortical processes that link difficulties with SPIN perception to difficulties with figure-ground perception using functional magnetic resonance imaging. We found strong evidence that the earliest stages of the auditory cortical hierarchy (left core and belt areas) are similarly disinhibited when SPIN and figure-ground tasks are more difficult (i.e., at target-to-masker ratios corresponding to 60% rather than 90% performance)—consistent with increased cortical gain at lower levels of the auditory hierarchy. Overall, our results reveal a common neural substrate for these basic (figure-ground) and naturally relevant (SPIN) tasks—which provides a common computational basis for the link between SPIN perception and fundamental auditory grouping.
Collapse
Affiliation(s)
- Emma Holmes
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, UCL, London WC1N 3AR, UK
| | - Peter Zeidman
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, UCL, London WC1N 3AR, UK
| | - Karl J Friston
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, UCL, London WC1N 3AR, UK
| | - Timothy D Griffiths
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, UCL, London WC1N 3AR, UK.,Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| |
Collapse
|
2
|
Teki S, Barascud N, Picard S, Payne C, Griffiths TD, Chait M. Neural Correlates of Auditory Figure-Ground Segregation Based on Temporal Coherence. Cereb Cortex 2016; 26:3669-80. [PMID: 27325682 PMCID: PMC5004755 DOI: 10.1093/cercor/bhw173] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To make sense of natural acoustic environments, listeners must parse complex mixtures of sounds that vary in frequency, space, and time. Emerging work suggests that, in addition to the well-studied spectral cues for segregation, sensitivity to temporal coherence-the coincidence of sound elements in and across time-is also critical for the perceptual organization of acoustic scenes. Here, we examine pre-attentive, stimulus-driven neural processes underlying auditory figure-ground segregation using stimuli that capture the challenges of listening in complex scenes where segregation cannot be achieved based on spectral cues alone. Signals ("stochastic figure-ground": SFG) comprised a sequence of brief broadband chords containing random pure tone components that vary from 1 chord to another. Occasional tone repetitions across chords are perceived as "figures" popping out of a stochastic "ground." Magnetoencephalography (MEG) measurement in naïve, distracted, human subjects revealed robust evoked responses, commencing from about 150 ms after figure onset that reflect the emergence of the "figure" from the randomly varying "ground." Neural sources underlying this bottom-up driven figure-ground segregation were localized to planum temporale, and the intraparietal sulcus, demonstrating that this area, outside the "classic" auditory system, is also involved in the early stages of auditory scene analysis."
Collapse
Affiliation(s)
- Sundeep Teki
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK
- Auditory Cognition Group, Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Current address: Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, UK
| | - Nicolas Barascud
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK
- Ear Institute, University College London, London WC1X 8EE, UK
| | - Samuel Picard
- Ear Institute, University College London, London WC1X 8EE, UK
| | | | - Timothy D. Griffiths
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK
- Auditory Cognition Group, Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Maria Chait
- Ear Institute, University College London, London WC1X 8EE, UK
| |
Collapse
|
3
|
Teki S, Kumar S, Griffiths TD. Large-Scale Analysis of Auditory Segregation Behavior Crowdsourced via a Smartphone App. PLoS One 2016; 11:e0153916. [PMID: 27096165 PMCID: PMC4838209 DOI: 10.1371/journal.pone.0153916] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 04/06/2016] [Indexed: 11/23/2022] Open
Abstract
The human auditory system is adept at detecting sound sources of interest from a complex mixture of several other simultaneous sounds. The ability to selectively attend to the speech of one speaker whilst ignoring other speakers and background noise is of vital biological significance—the capacity to make sense of complex ‘auditory scenes’ is significantly impaired in aging populations as well as those with hearing loss. We investigated this problem by designing a synthetic signal, termed the ‘stochastic figure-ground’ stimulus that captures essential aspects of complex sounds in the natural environment. Previously, we showed that under controlled laboratory conditions, young listeners sampled from the university subject pool (n = 10) performed very well in detecting targets embedded in the stochastic figure-ground signal. Here, we presented a modified version of this cocktail party paradigm as a ‘game’ featured in a smartphone app (The Great Brain Experiment) and obtained data from a large population with diverse demographical patterns (n = 5148). Despite differences in paradigms and experimental settings, the observed target-detection performance by users of the app was robust and consistent with our previous results from the psychophysical study. Our results highlight the potential use of smartphone apps in capturing robust large-scale auditory behavioral data from normal healthy volunteers, which can also be extended to study auditory deficits in clinical populations with hearing impairments and central auditory disorders.
Collapse
Affiliation(s)
- Sundeep Teki
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, United Kingdom
- * E-mail:
| | - Sukhbinder Kumar
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Timothy D. Griffiths
- Wellcome Trust Centre for Neuroimaging, University College London, London, United Kingdom
- Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
4
|
Pham CQ, Bremen P, Shen W, Yang SM, Middlebrooks JC, Zeng FG, Mc Laughlin M. Central Auditory Processing of Temporal and Spectral-Variance Cues in Cochlear Implant Listeners. PLoS One 2015; 10:e0132423. [PMID: 26176553 PMCID: PMC4503639 DOI: 10.1371/journal.pone.0132423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 06/13/2015] [Indexed: 11/25/2022] Open
Abstract
Cochlear implant (CI) listeners have difficulty understanding speech in complex listening environments. This deficit is thought to be largely due to peripheral encoding problems arising from current spread, which results in wide peripheral filters. In normal hearing (NH) listeners, central processing contributes to segregation of speech from competing sounds. We tested the hypothesis that basic central processing abilities are retained in post-lingually deaf CI listeners, but processing is hampered by degraded input from the periphery. In eight CI listeners, we measured auditory nerve compound action potentials to characterize peripheral filters. Then, we measured psychophysical detection thresholds in the presence of multi-electrode maskers placed either inside (peripheral masking) or outside (central masking) the peripheral filter. This was intended to distinguish peripheral from central contributions to signal detection. Introduction of temporal asynchrony between the signal and masker improved signal detection in both peripheral and central masking conditions for all CI listeners. Randomly varying components of the masker created spectral-variance cues, which seemed to benefit only two out of eight CI listeners. Contrastingly, the spectral-variance cues improved signal detection in all five NH listeners who listened to our CI simulation. Together these results indicate that widened peripheral filters significantly hamper central processing of spectral-variance cues but not of temporal cues in post-lingually deaf CI listeners. As indicated by two CI listeners in our study, however, post-lingually deaf CI listeners may retain some central processing abilities similar to NH listeners.
Collapse
Affiliation(s)
- Carol Q. Pham
- Center for Hearing Research, University of California Irvine, Irvine, California, United States of America
- Department of Anatomy and Neurobiology, University of California Irvine, Irvine, California, United States of America
- * E-mail:
| | - Peter Bremen
- Center for Hearing Research, University of California Irvine, Irvine, California, United States of America
- Department of Otolaryngology- Head and Neck Surgery, University of California Irvine, Irvine, California, United States of America
| | - Weidong Shen
- Institute of Otolaryngology, Chinese PLA Genera Hospital, Beijing, China
| | - Shi-Ming Yang
- Institute of Otolaryngology, Chinese PLA Genera Hospital, Beijing, China
| | - John C. Middlebrooks
- Center for Hearing Research, University of California Irvine, Irvine, California, United States of America
- Department of Otolaryngology- Head and Neck Surgery, University of California Irvine, Irvine, California, United States of America
- Department of Neurobiology and Behavior, University of California Irvine, Irvine, California, United States of America
- Department of Biomedical Engineering, University of California Irvine, Irvine, California, United States of America
- Department of Cognitive Sciences, University of California Irvine, Irvine, California, United States of America
| | - Fan-Gang Zeng
- Center for Hearing Research, University of California Irvine, Irvine, California, United States of America
- Department of Anatomy and Neurobiology, University of California Irvine, Irvine, California, United States of America
- Department of Otolaryngology- Head and Neck Surgery, University of California Irvine, Irvine, California, United States of America
- Department of Biomedical Engineering, University of California Irvine, Irvine, California, United States of America
- Department of Cognitive Sciences, University of California Irvine, Irvine, California, United States of America
| | - Myles Mc Laughlin
- Center for Hearing Research, University of California Irvine, Irvine, California, United States of America
- Department of Otolaryngology- Head and Neck Surgery, University of California Irvine, Irvine, California, United States of America
| |
Collapse
|
5
|
Dai H, Buss E. Optimal integration of independent observations from Poisson sources. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:EL20-EL25. [PMID: 25618094 PMCID: PMC4272378 DOI: 10.1121/1.4903228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Revised: 09/26/2014] [Accepted: 11/20/2014] [Indexed: 06/04/2023]
Abstract
The optimal integration of information from independent Poisson sources (such as neurons) was analyzed in the context of a two-interval, forced-choice detection task. When the mean count of the Poisson distribution is above 1, the benefit of integration is closely approximated by the predictions based on the square-root law of the Gaussian model. When the mean count falls far below 1, however, the benefit of integration clearly exceeds the predictions based on the square-root law.
Collapse
Affiliation(s)
- Huanping Dai
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| |
Collapse
|
6
|
Bohlen P, Dylla M, Timms C, Ramachandran R. Detection of modulated tones in modulated noise by non-human primates. J Assoc Res Otolaryngol 2014; 15:801-21. [PMID: 24899380 DOI: 10.1007/s10162-014-0467-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 05/08/2014] [Indexed: 10/25/2022] Open
Abstract
In natural environments, many sounds are amplitude-modulated. Amplitude modulation is thought to be a signal that aids auditory object formation. A previous study of the detection of signals in noise found that when tones or noise were amplitude-modulated, the noise was a less effective masker, and detection thresholds for tones in noise were lowered. These results suggest that the detection of modulated signals in modulated noise would be enhanced. This paper describes the results of experiments investigating how detection is modified when both signal and noise were amplitude-modulated. Two monkeys (Macaca mulatta) were trained to detect amplitude-modulated tones in continuous, amplitude-modulated broadband noise. When the phase difference of otherwise similarly amplitude-modulated tones and noise were varied, detection thresholds were highest when the modulations were in phase and lowest when the modulations were anti-phase. When the depth of the modulation of tones or noise was varied, detection thresholds decreased if the modulations were anti-phase. When the modulations were in phase, increasing the depth of tone modulation caused an increase in tone detection thresholds, but increasing depth of noise modulations did not affect tone detection thresholds. Changing the modulation frequency of tone or noise caused changes in threshold that saturated at modulation frequencies higher than 20 Hz; thresholds decreased when the tone and noise modulations were in phase and decreased when they were anti-phase. The relationship between reaction times and tone level were not modified by manipulations to the nature of temporal variations in the signal or noise. The changes in behavioral threshold were consistent with a model where the brain subtracted noise from signal. These results suggest that the parameters of the modulation of signals and maskers heavily influence detection in very predictable ways. These results are consistent with some results in humans and avians and form the baseline for neurophysiological studies of mechanisms of detection in noise.
Collapse
Affiliation(s)
- Peter Bohlen
- Department of Hearing and Speech Sciences, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA,
| | | | | | | |
Collapse
|
7
|
Kidd G, Mason CR, Streeter T, Thompson ER, Best V, Wakefield GH. Perceiving sequential dependencies in auditory streams. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:1215-1231. [PMID: 23927120 PMCID: PMC3745531 DOI: 10.1121/1.4812276] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Revised: 05/30/2013] [Accepted: 06/08/2013] [Indexed: 05/30/2023]
Abstract
This study examined the ability of human listeners to detect the presence and judge the strength of a statistical dependency among the elements comprising sequences of sounds. The statistical dependency was imposed by specifying transition matrices that determined the likelihood of occurrence of the sound elements. Markov chains were constructed from these transition matrices having states that were pure tones/noise bursts that varied along the stimulus dimensions of frequency and/or interaural time difference. Listeners reliably detected the presence of a statistical dependency in sequences of sounds varying along these stimulus dimensions. Furthermore, listeners were able to discriminate the relative strength of the dependency in pairs of successive sound sequences. Random variation along an irrelevant stimulus dimension had small but significant adverse effects on performance. A much greater decrement in performance was found when the sound sequences were concurrent. Likelihood ratios were computed based on the transition matrices to specify Ideal Observer performance for the experimental conditions. Preliminary modeling efforts were made based on degradations of Ideal Observer performance intended to represent human observer limitations. This experimental approach appears to be useful for examining auditory "stream" formation and maintenance over time based on the predictability of the constituent sound elements.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA.
| | | | | | | | | | | |
Collapse
|
8
|
Teki S, Chait M, Kumar S, Shamma S, Griffiths TD. Segregation of complex acoustic scenes based on temporal coherence. eLife 2013; 2:e00699. [PMID: 23898398 PMCID: PMC3721234 DOI: 10.7554/elife.00699] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2013] [Accepted: 06/16/2013] [Indexed: 11/13/2022] Open
Abstract
In contrast to the complex acoustic environments we encounter everyday, most studies of auditory segregation have used relatively simple signals. Here, we synthesized a new stimulus to examine the detection of coherent patterns (‘figures’) from overlapping ‘background’ signals. In a series of experiments, we demonstrate that human listeners are remarkably sensitive to the emergence of such figures and can tolerate a variety of spectral and temporal perturbations. This robust behavior is consistent with the existence of automatic auditory segregation mechanisms that are highly sensitive to correlations across frequency and time. The observed behavior cannot be explained purely on the basis of adaptation-based models used to explain the segregation of deterministic narrowband signals. We show that the present results are consistent with the predictions of a model of auditory perceptual organization based on temporal coherence. Our data thus support a role for temporal coherence as an organizational principle underlying auditory segregation. DOI:http://dx.doi.org/10.7554/eLife.00699.001 Even when seated in the middle of a crowded restaurant, we are still able to distinguish the speech of the person sitting opposite us from the conversations of fellow diners and a host of other background noise. While we generally perform this task almost effortlessly, it is unclear how the brain solves what is in reality a complex information processing problem. In the 1970s, researchers began to address this question using stimuli consisting of simple tones. When subjects are played a sequence of alternating high and low frequency tones, they perceive them as two independent streams of sound. Similar experiments in macaque monkeys reveal that each stream activates a different area of auditory cortex, suggesting that the brain may distinguish acoustic stimuli on the basis of their frequency. However, the simple tones that are used in laboratory experiments bear little resemblance to the complex sounds we encounter in everyday life. These are often made up of multiple frequencies, and overlap—both in frequency and in time—with other sounds in the environment. Moreover, recent experiments have shown that if a subject hears two tones simultaneously, he or she perceives them as belonging to a single stream of sound even if they have different frequencies: models that assume that we distinguish stimuli from noise on the basis of frequency alone struggle to explain this observation. Now, Teki, Chait, et al. have used more complex sounds, in which frequency components of the target stimuli overlap with those of background signals, to obtain new insights into how the brain solves this problem. Subjects were extremely good at discriminating these complex target stimuli from background noise, and computational modelling confirmed that they did so via integration of both frequency and temporal information. The work of Teki, Chait, et al. thus offers the first explanation for our ability to home in on speech and other pertinent sounds, even amidst a sea of background noise. DOI:http://dx.doi.org/10.7554/eLife.00699.002
Collapse
Affiliation(s)
- Sundeep Teki
- Wellcome Trust Centre for Neuroimaging , University College London , London , United Kingdom
| | | | | | | | | |
Collapse
|
9
|
Leibold LJ. Development of Auditory Scene Analysis and Auditory Attention. HUMAN AUDITORY DEVELOPMENT 2012. [DOI: 10.1007/978-1-4614-1421-6_5] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
10
|
Richards VM, Shub DE, Carreira EM. The role of masker fringes for the detection of coherent tone pips. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:883-892. [PMID: 21877803 PMCID: PMC3190658 DOI: 10.1121/1.3613701] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2010] [Revised: 06/20/2011] [Accepted: 06/23/2011] [Indexed: 05/31/2023]
Abstract
Three experiments investigated the role of pre/post exposure to a masker in a detection task with complex, random, spectro-temporal maskers. In the first experiment, the masker was either continuously presented or pulsed on and off with the signal. For most listeners, thresholds were lower when the masker was continuously presented, despite the fact that there was more uncertainty about the timing of the signal. In the second experiment, the signal-bearing portion of the masker was preceded and followed by masker "fringes" of different durations. Consistent with the findings of Experiment 1, for some listeners shorter-duration fringes led to higher thresholds than long-duration fringes. In the third experiment, the masker fringe (a) preceded, (b) followed, or (c) both preceded and followed, the signal. Relative to the middle signal conditions, a late signal yielded lower thresholds and the early signal yielded higher thresholds. These results indicate that listeners can use features of an ongoing sound to extract an added signal and that listeners differ in the importance of pre-exposure for efficient signal extraction. However, listeners do not appear to perform this comparison retrospectively after the signal, potentially indicating a form of backward masking.
Collapse
Affiliation(s)
- Virginia M Richards
- Department of Cognitive Sciences, University of California, Irvine, CA 92697-5200, USA
| | | | | |
Collapse
|
11
|
Huang R, Richards VM. Estimates of internal templates for the detection of sequential tonal patterns. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:3831-40. [PMID: 19206809 PMCID: PMC2654203 DOI: 10.1121/1.2967827] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Revised: 06/18/2007] [Accepted: 06/24/2008] [Indexed: 05/25/2023]
Abstract
In this experiment, listeners detected sequential tonal patterns embedded in multitone multiburst random maskers. The maskers consisted of eight 30 ms bursts of random-frequency tones. The signal, when present, occupied the central six bursts and was centered at 1000 Hz. The six sequential signal tones formed several spectro-temporal patterns: an equal-frequency pattern, three ascending patterns with frequency ranges spanning 0.5-, 1-, and 2-equivalent rectangular bandwidths (ERBs), and a random pattern with frequencies drawn at random from the range of 925-1075 Hz. The total number of tones in each burst, m, was varied to determine detection threshold. The detectability of the signal pattern declined as the frequency range of the signal pattern increased, and when the signal was random. Relative weights as a function of time and frequency, interpreted as listeners' internal templates, depended systematically on the properties of the signal pattern tested. The templates indicated that when sensitivity was poor, listeners integrated increasingly broad spectro-temporal regions around the signal frequencies, and sometimes integrated energy from the final burst even though the signal tones never occupied the final burst.
Collapse
Affiliation(s)
- Rong Huang
- Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
| | | |
Collapse
|
12
|
Bee MA, Micheyl C. The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? J Comp Psychol 2008; 122:235-51. [PMID: 18729652 PMCID: PMC2692487 DOI: 10.1037/0735-7036.122.3.235] [Citation(s) in RCA: 195] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Animals often use acoustic signals to communicate in groups or social aggregations in which multiple individuals signal within a receiver's hearing range. Consequently, receivers face challenges related to acoustic interference and auditory masking that are not unlike the human cocktail party problem, which refers to the problem of perceiving speech in noisy social settings. Understanding the sensory solutions to the cocktail party problem has been a goal of research on human hearing and speech communication for several decades. Despite a general interest in acoustic signaling in groups, animal behaviorists have devoted comparatively less attention toward understanding how animals solve problems equivalent to the human cocktail party problem. After illustrating how humans and nonhuman animals experience and overcome similar perceptual challenges in cocktail-party-like social environments, this article reviews previous psychophysical and physiological studies of humans and nonhuman animals to describe how the cocktail party problem can be solved. This review also outlines several basic and applied benefits that could result from studies of the cocktail party problem in the context of animal acoustic communication.
Collapse
Affiliation(s)
- Mark A Bee
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN 55108, USA.
| | | |
Collapse
|
13
|
|
14
|
Huang R, Richards VM. Coherence detection: effects of frequency, frequency uncertainty, and onset/offset delays. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:2298-304. [PMID: 16642843 DOI: 10.1121/1.2179730] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The detectability of a sequence of equal-frequency (coherent) tonal components embedded in random, multiburst maskers was evaluated. The masker was comprised of tonal components located in a time-by-frequency spectrogram with eight 30 ms time columns and 29 frequency rows ranging logarithmically from 200 to 5000 Hz. The probability that a tone occurred in any one cell of the spectrogram, p, was the independent variable. The signal and masker components were of equal duration and equal level. Using a yes/no procedure, threshold values of p were estimated for five signal frequencies (220, 445, 1000, 2245, 4490 Hz) and when the signal frequency was random. Thresholds were worst for the random-frequency signal and best for the fixed 1000 Hz signal. In additional conditions, the value of p was fixed and the signal components were delayed relative to the masker components. A 1 ms delay provided better sensitivity (d' grew from 0.5 to 1) for all but the lowest signal frequency tested. An analysis of no-signal trials revealed that false alarm rates were higher when components falling at the signal frequency were consecutive than when they were distributed across bursts. Thus, coherence rather than total energy at the signal frequency is important for signal detection.
Collapse
Affiliation(s)
- Rong Huang
- Department of Psychology, 3401 Walnut Street, Suite 302C, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
| | | |
Collapse
|
15
|
Richards VM, Tang Z. Estimates of effective frequency selectivity based on the detection of a tone added to complex maskers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:1574-84. [PMID: 16583902 DOI: 10.1121/1.2165001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
In Experiment 1, the validity of parameters associated with the roex(p, r) auditory filter shape was examined for three different types of maskers: (a) A noise masker, (b) a random 12-tone masker whose frequencies varied on a burst-by-burst basis [multiple-burst different (MBD)], and (c) a random 12-tone masker whose frequencies were the same across bursts [multiple-burst same (MBS)]. First, the power spectrum model of masking was used to estimate auditory filter shapes for four observers. Second, the resulting auditory filter shapes were used in a computer simulation that provided an estimate of internal noise for each observer. Third, relative weights across frequency were estimated for each observer and each masker type. For the noise masker, these analyses provided predictions and relative weights that were consistent across the three analyses. For the MBD and MBS maskers, there was little consistency; neither the estimated internal noise nor the estimated relative weights reliably supported a single-filter model of detection. In Experiment 2, the time course for the detection of a tone added to an MBD masker was evaluated by estimating relative weights jointly in time and frequency. The relative weights at the signal frequency formed a rough inverse "U" across time.
Collapse
Affiliation(s)
- Virginia M Richards
- Department of Psychology, University of Pennsylvania, 3401 Walnut Street, Suite 302C, Philadelphia, Pennsylvania 19104, USA.
| | | |
Collapse
|
16
|
Kidd G, Mason CR, Arbogast TL. Similarity, uncertainty, and masking in the identification of nonspeech auditory patterns. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2002; 111:1367-1376. [PMID: 11931314 DOI: 10.1121/1.1448342] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
This study examined whether increasing the similarity between informational maskers and signals would increase the amount of masking obtained in a nonspeech pattern identification task. The signals were contiguous sequences of pure-tone bursts arranged in six narrow-band spectro-temporal patterns. The informational maskers were sequences of multitone bursts played synchronously with the signal tones. The listener's task was to identify the patterns in a 1-interval 6-alternative forced-choice procedure. Three types of multitone maskers were generated according to different randomization rules. For the least signal-like informational masker, the components in each multitone burst were chosen at random within the frequency range of 200-6500 Hz, excluding a "protected region" around the signal frequencies. For the intermediate masker, the frequency components in the first burst were chosen quasirandomly, but the components in successive bursts were constrained to fall in narrow frequency bands around the frequencies of the components in the initial burst. Within the narrow bands the frequencies were randomized. This masker was considered to be more similar to the signal patterns because it consisted of a set of narrow-band sequences any one of which might be mistaken for a signal pattern. The most signal-like masker was similar to the intermediate masker in that it consisted of a set of synchronously played narrow-band sequences, but the variation in frequency within each sequence was sinusoidal, completing roughly one period in a sequence. This masker consisted of discernible patterns but not patterns that were part of the set of signals. In addition, masking produced by Gaussian noise bursts--thought to produce primarily peripherally based "energetic masking"--was measured and compared to the informational masking results. For the three informational maskers, more masking was produced by the maskers comprised of narrow-band sequences than for the masker in which the frequencies were not constrained to narrow bands. Also, the slopes of the performance-level functions for the three informational maskers were much shallower than for the Gaussian noise masker or for no masker. The findings provided qualified support for the hypothesis that increasing the similarity between signals and maskers, or parts of the maskers, causes greater informational masking. However, it is also possible that the greater masking was a consequence of increasing the number of perceptual "streams" that had to be evaluated by the listener.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Communication Disorders and Hearing Research Center, Boston University, Massachusetts 02215, USA
| | | | | |
Collapse
|
17
|
Wright BA, Saberi K. Strategies used to detect auditory signals in small sets of random maskers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 1999; 105:1765-1775. [PMID: 10089600 DOI: 10.1121/1.426714] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Detection performance for a masked auditory signal of fixed frequency can be substantially degraded if there is uncertainty about the frequency content of the masker. A quasimolecular psychophysical approach was used to examine response strategies in masker-uncertainty conditions, and to investigate the influence of uncertainty when the number of different masker samples was limited to ten or fewer. The task of the four listeners was to detect a 1000-Hz signal that was presented simultaneously with one of ten ten-tone masker samples. The masker sample was either fixed throughout a block of two-interval forced-choice trials or was randomized across or within trials. The primary results showed that: (1) When the signal level was low and the masker sample differed between the two intervals of a trial, most listeners based their responses more on the presence of specific masker samples than on the signal. (2) The detrimental effect of masker uncertainty was clearly evident when only four maskers were randomly presented, and grew as the size of the masker set was increased from two to ten. (3) The slopes of psychometric functions measured with the same masker samples differed among the fixed and two random-masker conditions. (4) There were large differences in the influence of masker uncertainty across masker samples and listeners. These data demonstrate the great susceptibility of human listeners to the influence of masker uncertainty and the ability of quasimolecular investigations to reveal important aspects of behavior in uncertainty condition.
Collapse
Affiliation(s)
- B A Wright
- Audiology and Hearing Sciences Program, Northwestern University, Evanston, Illinois 60208-3550, USA.
| | | |
Collapse
|
18
|
Kidd G, Mason CR, Rohtla TL, Deliwala PS. Release from masking due to spatial separation of sources in the identification of nonspeech auditory patterns. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 1998; 104:422-431. [PMID: 9670534 DOI: 10.1121/1.423246] [Citation(s) in RCA: 124] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
A nonspeech pattern identification task was used to study the role of spatial separation of sources on auditory masking in multisource listening environments. The six frequency patterns forming the signal set were comprised of sequences of eight 60-ms tone bursts. Bursts of masking sounds were played synchronously with the signals. The main variables in the study were (1) the difference in spatial separation in the horizontal plane between signals and maskers and (2) the nature of the masking produced by the maskers. Spatial separation of signal and masker ranged from 0-180 degrees. The maskers were of two types: (1) a sequence of eight 60-ms bursts of Gaussian noise intended to produce predominantly peripherally based "energetic masking" and (2) a sequence of eight 60-ms bursts of eight-tone complexes intended to produce primarily centrally based "informational masking." The results indicated that identification performance improved with increasing separation of signal and masker. The amount of improvement depended upon the type of masker and the center frequency of the signal patterns. Much larger improvements were found for spatial separation of the signal and informational masker than for the signal and energetic masker. This was particularly apparent when the acoustical advantage of the signal-to-noise ratio in the more favorable of the two ears (the ear nearest the signal) was taken into account. The results were interpreted as evidence for an important role of binaural hearing in reducing sound source or message uncertainty and may contribute toward solving the "cocktail party problem."
Collapse
Affiliation(s)
- G Kidd
- Department of Communication Disorders, Boston University, Massachusetts 02215, USA
| | | | | | | |
Collapse
|