1
|
Williams JR, Störmer VS. Cutting Through the Noise: Auditory Scenes and Their Effects on Visual Object Processing. Psychol Sci 2024:9567976241237737. [PMID: 38889285 DOI: 10.1177/09567976241237737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024] Open
Abstract
Despite the intuitive feeling that our visual experience is coherent and comprehensive, the world is full of ambiguous and indeterminate information. Here we explore how the visual system might take advantage of ambient sounds to resolve this ambiguity. Young adults (ns = 20-30) were tasked with identifying an object slowly fading in through visual noise while a task-irrelevant sound played. We found that participants demanded more visual information when the auditory object was incongruent with the visual object compared to when it was not. Auditory scenes, which are only probabilistically related to specific objects, produced similar facilitation even for unheard objects (e.g., a bench). Notably, these effects traverse categorical and specific auditory and visual-processing domains as participants performed across-category and within-category visual tasks, underscoring cross-modal integration across multiple levels of perceptual processing. To summarize, our study reveals the importance of audiovisual interactions to support meaningful perceptual experiences in naturalistic settings.
Collapse
Affiliation(s)
| | - Viola S Störmer
- Department of Psychology, University of California, San Diego
- Department of Psychological and Brain Sciences, Dartmouth College
| |
Collapse
|
2
|
Hake R, Bürgel M, Nguyen NK, Greasley A, Müllensiefen D, Siedenburg K. Development of an adaptive test of musical scene analysis abilities for normal-hearing and hearing-impaired listeners. Behav Res Methods 2023:10.3758/s13428-023-02279-y. [PMID: 37957432 DOI: 10.3758/s13428-023-02279-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 11/15/2023]
Abstract
Auditory scene analysis (ASA) is the process through which the auditory system makes sense of complex acoustic environments by organising sound mixtures into meaningful events and streams. Although music psychology has acknowledged the fundamental role of ASA in shaping music perception, no efficient test to quantify listeners' ASA abilities in realistic musical scenarios has yet been published. This study presents a new tool for testing ASA abilities in the context of music, suitable for both normal-hearing (NH) and hearing-impaired (HI) individuals: the adaptive Musical Scene Analysis (MSA) test. The test uses a simple 'yes-no' task paradigm to determine whether the sound from a single target instrument is heard in a mixture of popular music. During the online calibration phase, 525 NH and 131 HI listeners were recruited. The level ratio between the target instrument and the mixture, choice of target instrument, and number of instruments in the mixture were found to be important factors affecting item difficulty, whereas the influence of the stereo width (induced by inter-aural level differences) only had a minor effect. Based on a Bayesian logistic mixed-effects model, an adaptive version of the MSA test was developed. In a subsequent validation experiment with 74 listeners (20 HI), MSA scores showed acceptable test-retest reliability and moderate correlations with other music-related tests, pure-tone-average audiograms, age, musical sophistication, and working memory capacities. The MSA test is a user-friendly and efficient open-source tool for evaluating musical ASA abilities and is suitable for profiling the effects of hearing impairment on music perception.
Collapse
Affiliation(s)
- Robin Hake
- Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.
| | - Michel Bürgel
- Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Ninh K Nguyen
- Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | | | - Daniel Müllensiefen
- Department of Psychology, Goldsmiths, University of London, London, UK
- Hanover Music Lab, Hochschule Für Musik, Theater und Medien, Hannover, Germany
| | - Kai Siedenburg
- Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
3
|
Schmuckler MA, Moranis R. Rhythm contour drives musical memory. Atten Percept Psychophys 2023; 85:2502-2514. [PMID: 36991289 DOI: 10.3758/s13414-023-02700-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2023] [Indexed: 03/31/2023]
Abstract
Listeners' use of contour information as a basis for memory of rhythmic patterns was explored in two experiments. Both studies employed a short-term memory paradigm in which listeners heard a standard rhythm, followed by a comparison rhythm, and judged whether the comparison was the same as the standard. Comparison rhythms included exact repetitions of the standard, same contour rhythms in which the relative interval durations of successive notes (but not the absolute durations of the notes themselves) were the same as the standard, and different contour rhythms in which the relative duration intervals of successive notes differed from the standard. Experiment 1 employed metric rhythms, whereas Experiment 2 employed ametric rhythms. D-prime analyses revealed that, in both experiments, listeners showed better discrimination for different contour rhythms relative to same contour rhythms. Paralleling classic work on melodic contour, these findings indicate that the concept of contour is both relevant to one's characterization of the rhythm of musical patterns and influences short-term memory for such patterns.
Collapse
Affiliation(s)
- Mark A Schmuckler
- Department of Psychology, University of Toronto Scarborough, 1265 Military Trail, Scarborough, ON, M1C 1A4, Canada.
| | | |
Collapse
|
4
|
Susini P, Wenzel N, Houix O, Ponsot E. Psychophysical characterization of auditory temporal and frequency streaming capacities for listeners with different levels of musical expertise. JASA EXPRESS LETTERS 2023; 3:084402. [PMID: 37566904 DOI: 10.1121/10.0020546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/20/2023] [Indexed: 08/13/2023]
Abstract
Temporal and frequency auditory streaming capacities were assessed for non-musician (NM), expert musician (EM), and amateur musician (AM) listeners using a local-global task and an interleaved melody recognition task, respectively. Data replicate differences previously observed between NM and EM, and reveal that while AM exhibits a local-over-global processing change comparable to EM, their performance for segregating a melody embedded in a stream remains as poor as NM. The observed group partitioning along the temporal-frequency auditory streaming capacity map suggests a sequential, two-step development model of musical learning, whose contributing factors are discussed.
Collapse
Affiliation(s)
- Patrick Susini
- STMS, IRCAM, Sorbonne Université, CNRS, Ministère de la Culture, 75 004 Paris, , , ,
| | - Nicolas Wenzel
- STMS, IRCAM, Sorbonne Université, CNRS, Ministère de la Culture, 75 004 Paris, , , ,
| | - Olivier Houix
- STMS, IRCAM, Sorbonne Université, CNRS, Ministère de la Culture, 75 004 Paris, , , ,
| | - Emmanuel Ponsot
- STMS, IRCAM, Sorbonne Université, CNRS, Ministère de la Culture, 75 004 Paris, , , ,
| |
Collapse
|
5
|
Bürgel M, Picinali L, Siedenburg K. Listening in the Mix: Lead Vocals Robustly Attract Auditory Attention in Popular Music. Front Psychol 2021; 12:769663. [PMID: 35024038 PMCID: PMC8744650 DOI: 10.3389/fpsyg.2021.769663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 12/02/2021] [Indexed: 11/13/2022] Open
Abstract
Listeners can attend to and track instruments or singing voices in complex musical mixtures, even though the acoustical energy of sounds from individual instruments may overlap in time and frequency. In popular music, lead vocals are often accompanied by sound mixtures from a variety of instruments, such as drums, bass, keyboards, and guitars. However, little is known about how the perceptual organization of such musical scenes is affected by selective attention, and which acoustic features play the most important role. To investigate these questions, we explored the role of auditory attention in a realistic musical scenario. We conducted three online experiments in which participants detected single cued instruments or voices in multi-track musical mixtures. Stimuli consisted of 2-s multi-track excerpts of popular music. In one condition, the target cue preceded the mixture, allowing listeners to selectively attend to the target. In another condition, the target was presented after the mixture, requiring a more “global” mode of listening. Performance differences between these two conditions were interpreted as effects of selective attention. In Experiment 1, results showed that detection performance was generally dependent on the target’s instrument category, but listeners were more accurate when the target was presented prior to the mixture rather than the opposite. Lead vocals appeared to be nearly unaffected by this change in presentation order and achieved the highest accuracy compared with the other instruments, which suggested a particular salience of vocal signals in musical mixtures. In Experiment 2, filtering was used to avoid potential spectral masking of target sounds. Although detection accuracy increased for all instruments, a similar pattern of results was observed regarding the instrument-specific differences between presentation orders. In Experiment 3, adjusting the sound level differences between the targets reduced the effect of presentation order, but did not affect the differences between instruments. While both acoustic manipulations facilitated the detection of targets, vocal signals remained particularly salient, which suggest that the manipulated features did not contribute to vocal salience. These findings demonstrate that lead vocals serve as robust attractor points of auditory attention regardless of the manipulation of low-level acoustical cues.
Collapse
Affiliation(s)
- Michel Bürgel
- Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
- *Correspondence: Michel Bürgel,
| | - Lorenzo Picinali
- Dyson School of Design Engineering, Imperial College London, London, United Kingdom
| | - Kai Siedenburg
- Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
6
|
McAuley JD, Shen Y, Smith T, Kidd GR. Effects of speech-rhythm disruption on selective listening with a single background talker. Atten Percept Psychophys 2021; 83:2229-2240. [PMID: 33782913 PMCID: PMC10612531 DOI: 10.3758/s13414-021-02298-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/05/2021] [Indexed: 11/08/2022]
Abstract
Recent work by McAuley et al. (Attention, Perception, & Psychophysics, 82, 3222-3233, 2020) using the Coordinate Response Measure (CRM) paradigm with a multitalker background revealed that altering the natural rhythm of target speech amidst background speech worsens target recognition (a target-rhythm effect), while altering background speech rhythm improves target recognition (a background-rhythm effect). Here, we used a single-talker background to examine the role of specific properties of target and background sound patterns on selective listening without the complexity of multiple background stimuli. Experiment 1 manipulated the sex of the background talker, presented with a male target talker, to assess target and background-rhythm effects with and without a strong pitch cue to aid perceptual segregation. Experiment 2 used a vocoded single-talker background to examine target and background-rhythm effects with envelope-based speech rhythms preserved, but without semantic content or temporal fine structure. While a target-rhythm effect was present with all backgrounds, the background-rhythm effect was only observed for the same-sex background condition. Results provide additional support for a selective entrainment hypothesis, while also showing that the background-rhythm effect is not driven by envelope-based speech rhythm alone, and may be reduced or eliminated when pitch or other acoustic differences provide a strong basis for selective listening.
Collapse
Affiliation(s)
- J Devin McAuley
- Department of Psychology, Michigan State University, East Lansing, MI, 48824, USA.
| | - Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
| | - Toni Smith
- Department of Psychology, Michigan State University, East Lansing, MI, 48824, USA
| | - Gary R Kidd
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
| |
Collapse
|
7
|
Siedenburg K, Goldmann K, van de Par S. Tracking Musical Voices in Bach's The Art of the Fugue: Timbral Heterogeneity Differentially Affects Younger Normal-Hearing Listeners and Older Hearing-Aid Users. Front Psychol 2021; 12:608684. [PMID: 33935864 PMCID: PMC8079728 DOI: 10.3389/fpsyg.2021.608684] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 03/15/2021] [Indexed: 12/03/2022] Open
Abstract
Auditory scene analysis is an elementary aspect of music perception, yet only little research has scrutinized auditory scene analysis under realistic musical conditions with diverse samples of listeners. This study probed the ability of younger normal-hearing listeners and older hearing-aid users in tracking individual musical voices or lines in JS Bach's The Art of the Fugue. Five-second excerpts with homogeneous or heterogenous instrumentation of 2–4 musical voices were presented from spatially separated loudspeakers and preceded by a short cue for signaling the target voice. Listeners tracked the cued voice and detected whether an amplitude modulation was imposed on the cued voice or a distractor voice. Results indicated superior performance of young normal-hearing listeners compared to older hearing-aid users. Performance was generally better in conditions with fewer voices. For young normal-hearing listeners, there was interaction between the number of voices and the instrumentation: performance degraded less drastically with an increase in the number of voices for timbrally heterogeneous mixtures compared to homogeneous mixtures. Older hearing-aid users generally showed smaller effects of the number of voices and instrumentation, but no interaction between the two factors. Moreover, tracking performance of older hearing aid users did not differ when these participants did or did not wear hearing aids. These results shed light on the role of timbral differentiation in musical scene analysis and suggest reduced musical scene analysis abilities of older hearing-impaired listeners in a realistic musical scenario.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Kirsten Goldmann
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Steven van de Par
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
8
|
Siedenburg K, Röttges S, Wagener KC, Hohmann V. Can You Hear Out the Melody? Testing Musical Scene Perception in Young Normal-Hearing and Older Hearing-Impaired Listeners. Trends Hear 2020; 24:2331216520945826. [PMID: 32895034 PMCID: PMC7502688 DOI: 10.1177/2331216520945826] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
It is well known that hearing loss compromises auditory scene analysis abilities,
as is usually manifested in difficulties of understanding speech in noise.
Remarkably little is known about auditory scene analysis of hearing-impaired
(HI) listeners when it comes to musical sounds. Specifically, it is unclear to
which extent HI listeners are able to hear out a melody or an instrument from a
musical mixture. Here, we tested a group of younger normal-hearing (yNH) and
older HI (oHI) listeners with moderate hearing loss in their ability to match
short melodies and instruments presented as part of mixtures. Four-tone
sequences were used in conjunction with a simple musical accompaniment that
acted as a masker (cello/piano dyads or spectrally matched noise). In each
trial, a signal-masker mixture was presented, followed by two different versions
of the signal alone. Listeners indicated which signal version was part of the
mixture. Signal versions differed either in terms of the sequential order of the
pitch sequence or in terms of timbre (flute vs. trumpet). Signal-to-masker
thresholds were measured by varying the signal presentation level in an adaptive
two-down/one-up procedure. We observed that thresholds of oHI listeners were
elevated by on average 10 dB compared with that of yNH listeners. In contrast to
yNH listeners, oHI listeners did not show evidence of listening in dips of the
masker. Musical training of participants was associated with a lowering of
thresholds. These results may indicate detrimental effects of hearing loss on
central aspects of musical scene perception.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg
| | - Saskia Röttges
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg
| | | | - Volker Hohmann
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg.,Hörzentrum Oldenburg GmbH & Hörtech gGmbH, Oldenburg, Germany
| |
Collapse
|
9
|
Greenlaw KM, Puschmann S, Coffey EBJ. Decoding of Envelope vs. Fundamental Frequency During Complex Auditory Stream Segregation. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2020; 1:268-287. [PMID: 37215227 PMCID: PMC10158587 DOI: 10.1162/nol_a_00013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 04/25/2020] [Indexed: 05/24/2023]
Abstract
Hearing-in-noise perception is a challenging task that is critical to human function, but how the brain accomplishes it is not well understood. A candidate mechanism proposes that the neural representation of an attended auditory stream is enhanced relative to background sound via a combination of bottom-up and top-down mechanisms. To date, few studies have compared neural representation and its task-related enhancement across frequency bands that carry different auditory information, such as a sound's amplitude envelope (i.e., syllabic rate or rhythm; 1-9 Hz), and the fundamental frequency of periodic stimuli (i.e., pitch; >40 Hz). Furthermore, hearing-in-noise in the real world is frequently both messier and richer than the majority of tasks used in its study. In the present study, we use continuous sound excerpts that simultaneously offer predictive, visual, and spatial cues to help listeners separate the target from four acoustically similar simultaneously presented sound streams. We show that while both lower and higher frequency information about the entire sound stream is represented in the brain's response, the to-be-attended sound stream is strongly enhanced only in the slower, lower frequency sound representations. These results are consistent with the hypothesis that attended sound representations are strengthened progressively at higher level, later processing stages, and that the interaction of multiple brain systems can aid in this process. Our findings contribute to our understanding of auditory stream separation in difficult, naturalistic listening conditions and demonstrate that pitch and envelope information can be decoded from single-channel EEG data.
Collapse
Affiliation(s)
- Keelin M. Greenlaw
- Department of Psychology, Concordia University, Montreal, QC, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS)
- The Centre for Research on Brain, Language and Music (CRBLM)
| | | | | |
Collapse
|
10
|
Zimmermann J, Ross B, Moscovitch M, Alain C. Neural dynamics supporting auditory long-term memory effects on target detection. Neuroimage 2020; 218:116979. [PMID: 32447014 DOI: 10.1016/j.neuroimage.2020.116979] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Revised: 05/15/2020] [Accepted: 05/18/2020] [Indexed: 12/31/2022] Open
Abstract
Auditory long-term memory has been shown to facilitate signal detection. However, the nature and timing of the cognitive processes supporting such benefits remain equivocal. We measured neuroelectric brain activity while young adults were presented with a contextual memory cue designed to assist with the detection of a faint pure tone target embedded in an audio clip of an everyday environmental scene (e.g., the soundtrack of a restaurant). During an initial familiarization task, participants heard such audio clips, half of which included a target sound (memory cue trials) at a specific time and location (left or right ear), as well as audio clips without a target (neutral trials). Following a 1-h or 24-h retention interval, the same audio clips were presented, but now all included a target. Participants were asked to press a button as soon as they heard the pure tone target. Overall, participants were faster and more accurate during memory than neutral cue trials. The auditory contextual memory effects on performance coincided with three temporally and spatially distinct neural modulations, which encompassed changes in the amplitude of event-related potential as well as changes in theta, alpha, beta and gamma power. Brain electrical source analyses revealed greater source activity in memory than neutral cue trials in the right superior temporal gyrus and left parietal cortex. Conversely, neutral trials were associated with greater source activity than memory cue trials in the left posterior medial temporal lobe. Target detection was associated with increased negativity (N2), and a late positive (P3b) wave at frontal and parietal sites, respectively. The effect of auditory contextual memory on brain activity preceding target onset showed little lateralization. Together, these results are consistent with contextual memory facilitating retrieval of target-context associations and deployment and management of auditory attentional resources to when the target occurred. The results also suggest that the auditory cortices, parietal cortex, and medial temporal lobe may be parts of a neural network enabling memory-guided attention during auditory scene analysis.
Collapse
Affiliation(s)
- Jacqueline Zimmermann
- Rotman Research Institute, Psychology, University of Toronto, Ontario, Canada; Department of Psychology, University of Toronto, Ontario, Canada
| | - Bernhard Ross
- Rotman Research Institute, Psychology, University of Toronto, Ontario, Canada; Department of Medical Biophysics, University of Toronto, Ontario, Canada; Institute of Medical Sciences, University of Toronto, Ontario, Canada
| | - Morris Moscovitch
- Rotman Research Institute, Psychology, University of Toronto, Ontario, Canada; Department of Psychology, University of Toronto, Ontario, Canada
| | - Claude Alain
- Rotman Research Institute, Psychology, University of Toronto, Ontario, Canada; Department of Psychology, University of Toronto, Ontario, Canada; Institute of Medical Sciences, University of Toronto, Ontario, Canada; Faculty of Music, University of Toronto, Ontario, Canada.
| |
Collapse
|
11
|
Enhanced auditory disembedding in an interleaved melody recognition test is associated with absolute pitch ability. Sci Rep 2019; 9:7838. [PMID: 31127171 PMCID: PMC6534562 DOI: 10.1038/s41598-019-44297-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 05/09/2019] [Indexed: 11/08/2022] Open
Abstract
Absolute pitch (AP) and autism have recently been associated with each other. Neurocognitive theories of autism could perhaps explain this co-occurrence. This study investigates whether AP musicians show an advantage in an interleaved melody recognition task (IMRT), an auditory version of an embedded figures test often investigated in autism with respect to the these theories. A total of N = 59 professional musicians (AP = 27) participated in the study. In each trial a probe melody was followed by an interleaved sequence. Participants had to indicate as to whether the probe melody was present in the interleaved sequence. Sensitivity index d′ and response bias c were calculated according to signal detection theory. Additionally, a pitch adjustment test measuring fine-graded differences in absolute pitch proficiency, the Autism-Spectrum-Quotient and a visual embedded figures test were conducted. AP outperformed relative pitch (RP) possessors on the overall IMRT and the fully interleaved condition. AP proficiency, visual disembedding and musicality predicted 39.2% of variance in the IMRT. No correlations were found between IMRT and autistic traits. Results are in line with a detailed-oriented cognitive style and enhanced perceptional functioning of AP musicians similar to that observed in autism.
Collapse
|
12
|
Coffey EBJ, Arseneau-Bruneau I, Zhang X, Zatorre RJ. The Music-In-Noise Task (MINT): A Tool for Dissecting Complex Auditory Perception. Front Neurosci 2019; 13:199. [PMID: 30930734 PMCID: PMC6427094 DOI: 10.3389/fnins.2019.00199] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 02/20/2019] [Indexed: 11/30/2022] Open
Abstract
The ability to segregate target sounds in noisy backgrounds is relevant both to neuroscience and to clinical applications. Recent research suggests that hearing-in-noise (HIN) problems are solved using combinations of sub-skills that are applied according to task demand and information availability. While evidence is accumulating for a musician advantage in HIN, the exact nature of the reported training effect is not fully understood. Existing HIN tests focus on tasks requiring understanding of speech in the presence of competing sound. Because visual, spatial and predictive cues are not systematically considered in these tasks, few tools exist to investigate the most relevant components of cognitive processes involved in stream segregation. We present the Music-In-Noise Task (MINT) as a flexible tool to expand HIN measures beyond speech perception, and for addressing research questions pertaining to the relative contributions of HIN sub-skills, inter-individual differences in their use, and their neural correlates. The MINT uses a match-mismatch trial design: in four conditions (Baseline, Rhythm, Spatial, and Visual) subjects first hear a short instrumental musical excerpt embedded in an informational masker of "multi-music" noise, followed by either a matching or scrambled repetition of the target musical excerpt presented in silence; the four conditions differ according to the presence or absence of additional cues. In a fifth condition (Prediction), subjects hear the excerpt in silence as a target first, which helps to anticipate incoming information when the target is embedded in masking sound. Data from samples of young adults show that the MINT has good reliability and internal consistency, and demonstrate selective benefits of musicianship in the Prediction, Rhythm, and Visual subtasks. We also report a performance benefit of multilingualism that is separable from that of musicianship. Average MINT scores were correlated with scores on a sentence-in-noise perception task, but only accounted for a relatively small percentage of the variance, indicating that the MINT is sensitive to additional factors and can provide a complement and extension of speech-based tests for studying stream segregation. A customizable version of the MINT is made available for use and extension by the scientific community.
Collapse
Affiliation(s)
- Emily B. J. Coffey
- Department of Psychology, Concordia University, Montreal, QC, Canada
- Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
- Centre for Research on Brain, Language and Music (CRBLM), Montreal, QC, Canada
- Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT), Montreal, QC, Canada
| | - Isabelle Arseneau-Bruneau
- Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
- Centre for Research on Brain, Language and Music (CRBLM), Montreal, QC, Canada
- Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT), Montreal, QC, Canada
- Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | - Xiaochen Zhang
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China
| | - Robert J. Zatorre
- Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
- Centre for Research on Brain, Language and Music (CRBLM), Montreal, QC, Canada
- Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT), Montreal, QC, Canada
- Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| |
Collapse
|
13
|
Wang D, Chen J. Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2018; 26:1702-1726. [PMID: 31223631 PMCID: PMC6586438 DOI: 10.1109/taslp.2018.2842159] [Citation(s) in RCA: 119] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This paper provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then, we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multitalker separation), and speech dereverberation, as well as multimicrophone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.
Collapse
Affiliation(s)
- DeLiang Wang
- Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA, and also with the Center of Intelligent Acoustics and Immersive Communications, Northwestern Polytechnical University, Xi'an 710072, China
| | - Jitong Chen
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA. He is now with Silicon Valley AI Lab, Baidu Research, Sunnyvale, CA 94089 USA
| |
Collapse
|
14
|
Abstract
The cocktail party problem requires listeners to infer individual sound sources from mixtures of sound. The problem can be solved only by leveraging regularities in natural sound sources, but little is known about how such regularities are internalized. We explored whether listeners learn source "schemas"-the abstract structure shared by different occurrences of the same type of sound source-and use them to infer sources from mixtures. We measured the ability of listeners to segregate mixtures of time-varying sources. In each experiment a subset of trials contained schema-based sources generated from a common template by transformations (transposition and time dilation) that introduced acoustic variation but preserved abstract structure. Across several tasks and classes of sound sources, schema-based sources consistently aided source separation, in some cases producing rapid improvements in performance over the first few exposures to a schema. Learning persisted across blocks that did not contain the learned schema, and listeners were able to learn and use multiple schemas simultaneously. No learning was evident when schema were presented in the task-irrelevant (i.e., distractor) source. However, learning from task-relevant stimuli showed signs of being implicit, in that listeners were no more likely to report that sources recurred in experiments containing schema-based sources than in control experiments containing no schema-based sources. The results implicate a mechanism for rapidly internalizing abstract sound structure, facilitating accurate perceptual organization of sound sources that recur in the environment.
Collapse
|
15
|
Disbergen NR, Valente G, Formisano E, Zatorre RJ. Assessing Top-Down and Bottom-Up Contributions to Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2018; 12:121. [PMID: 29563861 PMCID: PMC5845899 DOI: 10.3389/fnins.2018.00121] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 02/15/2018] [Indexed: 11/24/2022] Open
Abstract
Polyphonic music listening well exemplifies processes typically involved in daily auditory scene analysis situations, relying on an interactive interplay between bottom-up and top-down processes. Most studies investigating scene analysis have used elementary auditory scenes, however real-world scene analysis is far more complex. In particular, music, contrary to most other natural auditory scenes, can be perceived by either integrating or, under attentive control, segregating sound streams, often carried by different instruments. One of the prominent bottom-up cues contributing to multi-instrument music perception is their timbre difference. In this work, we introduce and validate a novel paradigm designed to investigate, within naturalistic musical auditory scenes, attentive modulation as well as its interaction with bottom-up processes. Two psychophysical experiments are described, employing custom-composed two-voice polyphonic music pieces within a framework implementing a behavioral performance metric to validate listener instructions requiring either integration or segregation of scene elements. In Experiment 1, the listeners' locus of attention was switched between individual instruments or the aggregate (i.e., both instruments together), via a task requiring the detection of temporal modulations (i.e., triplets) incorporated within or across instruments. Subjects responded post-stimulus whether triplets were present in the to-be-attended instrument(s). Experiment 2 introduced the bottom-up manipulation by adding a three-level morphing of instrument timbre distance to the attentional framework. The task was designed to be used within neuroimaging paradigms; Experiment 2 was additionally validated behaviorally in the functional Magnetic Resonance Imaging (fMRI) environment. Experiment 1 subjects (N = 29, non-musicians) completed the task at high levels of accuracy, showing no group differences between any experimental conditions. Nineteen listeners also participated in Experiment 2, showing a main effect of instrument timbre distance, even though within attention-condition timbre-distance contrasts did not demonstrate any timbre effect. Correlation of overall scores with morph-distance effects, computed by subtracting the largest from the smallest timbre distance scores, showed an influence of general task difficulty on the timbre distance effect. Comparison of laboratory and fMRI data showed scanner noise had no adverse effect on task performance. These Experimental paradigms enable to study both bottom-up and top-down contributions to auditory stream segregation and integration within psychophysical and neuroimaging experiments.
Collapse
Affiliation(s)
- Niels R. Disbergen
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Center (MBIC), Maastricht, Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Center (MBIC), Maastricht, Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Center (MBIC), Maastricht, Netherlands
| | - Robert J. Zatorre
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
- International Laboratory for Brain Music and Sound Research (BRAMS), Montreal, QC, Canada
| |
Collapse
|
16
|
Snyder JS, Elhilali M. Recent advances in exploring the neural underpinnings of auditory scene perception. Ann N Y Acad Sci 2017; 1396:39-55. [PMID: 28199022 PMCID: PMC5446279 DOI: 10.1111/nyas.13317] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 12/21/2016] [Accepted: 01/08/2017] [Indexed: 11/29/2022]
Abstract
Studies of auditory scene analysis have traditionally relied on paradigms using artificial sounds-and conventional behavioral techniques-to elucidate how we perceptually segregate auditory objects or streams from each other. In the past few decades, however, there has been growing interest in uncovering the neural underpinnings of auditory segregation using human and animal neuroscience techniques, as well as computational modeling. This largely reflects the growth in the fields of cognitive neuroscience and computational neuroscience and has led to new theories of how the auditory system segregates sounds in complex arrays. The current review focuses on neural and computational studies of auditory scene perception published in the last few years. Following the progress that has been made in these studies, we describe (1) theoretical advances in our understanding of the most well-studied aspects of auditory scene perception, namely segregation of sequential patterns of sounds and concurrently presented sounds; (2) the diversification of topics and paradigms that have been investigated; and (3) how new neuroscience techniques (including invasive neurophysiology in awake humans, genotyping, and brain stimulation) have been used in this field.
Collapse
Affiliation(s)
- Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, Nevada
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
17
|
Farris HE, Ryan MJ. Schema vs. primitive perceptual grouping: the relative weighting of sequential vs. spatial cues during an auditory grouping task in frogs. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2017; 203:175-182. [PMID: 28197725 PMCID: PMC10084916 DOI: 10.1007/s00359-017-1149-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Revised: 01/19/2017] [Accepted: 01/20/2017] [Indexed: 10/20/2022]
Abstract
Perceptually, grouping sounds based on their sources is critical for communication. This is especially true in túngara frog breeding aggregations, where multiple males produce overlapping calls that consist of an FM 'whine' followed by harmonic bursts called 'chucks'. Phonotactic females use at least two cues to group whines and chucks: whine-chuck spatial separation and sequence. Spatial separation is a primitive cue, whereas sequence is schema-based, as chuck production is morphologically constrained to follow whines, meaning that males cannot produce the components simultaneously. When one cue is available, females perceptually group whines and chucks using relative comparisons: components with the smallest spatial separation or those closest to the natural sequence are more likely grouped. By simultaneously varying the temporal sequence and spatial separation of a single whine and two chucks, this study measured between-cue perceptual weighting during a specific grouping task. Results show that whine-chuck spatial separation is a stronger grouping cue than temporal sequence, as grouping is more likely for stimuli with smaller spatial separation and non-natural sequence than those with larger spatial separation and natural sequence. Compared to the schema-based whine-chuck sequence, we propose that spatial cues have less variance, potentially explaining their preferred use when grouping during directional behavioral responses.
Collapse
Affiliation(s)
- Hamilton E Farris
- Neuroscience Center, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA. .,Department of Cell Biology and Anatomy, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA. .,Department of Otorhinolaryingology, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA.
| | - Michael J Ryan
- Department of Integrative Biology, University of Texas, 1 University Station C0930, Austin, TX, 78712, USA.,Smithsonian Tropical Research Institute, Balboa, Panama
| |
Collapse
|
18
|
Speech-in-noise perception in musicians: A review. Hear Res 2017; 352:49-69. [PMID: 28213134 DOI: 10.1016/j.heares.2017.02.006] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Revised: 02/01/2017] [Accepted: 02/05/2017] [Indexed: 11/23/2022]
Abstract
The ability to understand speech in the presence of competing sound sources is an important neuroscience question in terms of how the nervous system solves this computational problem. It is also a critical clinical problem that disproportionally affects the elderly, children with language-related learning disorders, and those with hearing loss. Recent evidence that musicians have an advantage on this multifaceted skill has led to the suggestion that musical training might be used to improve or delay the decline of speech-in-noise (SIN) function. However, enhancements have not been universally reported, nor have the relative contributions of different bottom-up versus top-down processes, and their relation to preexisting factors been disentangled. This information that would be helpful to establish whether there is a real effect of experience, what exactly is its nature, and how future training-based interventions might target the most relevant components of cognitive processes. These questions are complicated by important differences in study design and uneven coverage of neuroimaging modality. In this review, we aim to systematize recent results from studies that have specifically looked at musician-related differences in SIN by their study design properties, to summarize the findings, and to identify knowledge gaps for future work.
Collapse
|
19
|
Pelofi C, de Gardelle V, Egré P, Pressnitzer D. Interindividual variability in auditory scene analysis revealed by confidence judgements. Philos Trans R Soc Lond B Biol Sci 2017; 372:rstb.2016.0107. [PMID: 28044018 DOI: 10.1098/rstb.2016.0107] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/30/2016] [Indexed: 01/20/2023] Open
Abstract
Because musicians are trained to discern sounds within complex acoustic scenes, such as an orchestra playing, it has been hypothesized that musicianship improves general auditory scene analysis abilities. Here, we compared musicians and non-musicians in a behavioural paradigm using ambiguous stimuli, combining performance, reaction times and confidence measures. We used 'Shepard tones', for which listeners may report either an upward or a downward pitch shift for the same ambiguous tone pair. Musicians and non-musicians performed similarly on the pitch-shift direction task. In particular, both groups were at chance for the ambiguous case. However, groups differed in their reaction times and judgements of confidence. Musicians responded to the ambiguous case with long reaction times and low confidence, whereas non-musicians responded with fast reaction times and maximal confidence. In a subsequent experiment, non-musicians displayed reduced confidence for the ambiguous case when pure-tone components of the Shepard complex were made easier to discern. The results suggest an effect of musical training on scene analysis: we speculate that musicians were more likely to discern components within complex auditory scenes, perhaps because of enhanced attentional resolution, and thus discovered the ambiguity. For untrained listeners, stimulus ambiguity was not available to perceptual awareness.This article is part of the themed issue 'Auditory and visual scene analysis'.
Collapse
Affiliation(s)
- C Pelofi
- Laboratoire des systèmes perceptifs, CNRS UMR 8248, École normale supérieure - PSL Research University, 75005 Paris, France.,Institut d'étude de la cognition, École normale supérieure - PSL Research University, 75005 Paris, France
| | - V de Gardelle
- Paris School of Economics & CNRS, École normale supérieure - PSL Research University, 75005 Paris, France
| | - P Egré
- Institut Jean Nicod, CNRS UMR 8129, École normale supérieure - PSL Research University, 75005 Paris, France.,Institut d'étude de la cognition, École normale supérieure - PSL Research University, 75005 Paris, France
| | - D Pressnitzer
- Laboratoire des systèmes perceptifs, CNRS UMR 8248, École normale supérieure - PSL Research University, 75005 Paris, France .,Institut d'étude de la cognition, École normale supérieure - PSL Research University, 75005 Paris, France
| |
Collapse
|
20
|
Bouvet L, Mottron L, Valdois S, Donnadieu S. Auditory Stream Segregation in Autism Spectrum Disorder: Benefits and Downsides of Superior Perceptual Processes. J Autism Dev Disord 2016; 46:1553-61. [PMID: 24281422 DOI: 10.1007/s10803-013-2003-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Auditory stream segregation allows us to organize our sound environment, by focusing on specific information and ignoring what is unimportant. One previous study reported difficulty in stream segregation ability in children with Asperger syndrome. In order to investigate this question further, we used an interleaved melody recognition task with children in the autism spectrum disorder (ASD). In this task, a probe melody is followed by a mixed sequence, made up of a target melody interleaved with a distractor melody. These two melodies have either the same [0 semitone (ST)] or a different mean frequency (6, 12 or 24 ST separation conditions). Children have to identify if the probe melody is present in the mixed sequence. Children with ASD performed better than typical children when melodies were completely embedded. Conversely, they were impaired in the ST separation conditions. Our results confirm the difficulty of children with ASD in using a frequency cue to organize auditory perceptual information. However, superior performance in the completely embedded condition may result from superior perceptual processes in autism. We propose that this atypical pattern of results might reflect the expression of a single cognitive feature in autism.
Collapse
Affiliation(s)
- Lucie Bouvet
- Laboratoire de Neurosciences Fonctionnelles et Pathologiques, Département de psychologie, Université Lille 3, BP 60 149, 59653, Villeneuve d'Ascq Cedex, France. .,Laboratoire de Psychologie et Neurocognition (UMR CNRS 5105), Grenoble, France.
| | - Laurent Mottron
- Clinique spécialisée de l'autisme, Hôpital Rivière-des-Prairies, CETEDUM, Université de Montréal, Montréal, Canada
| | - Sylviane Valdois
- Laboratoire de Psychologie et Neurocognition (UMR CNRS 5105), Grenoble, France.,Centre National de la Recherche Scientifique, Paris, France
| | - Sophie Donnadieu
- Laboratoire de Psychologie et Neurocognition (UMR CNRS 5105), Grenoble, France.,Université de Savoie, Chambéry, France
| |
Collapse
|
21
|
Szabó BT, Denham SL, Winkler I. Computational Models of Auditory Scene Analysis: A Review. Front Neurosci 2016; 10:524. [PMID: 27895552 PMCID: PMC5108797 DOI: 10.3389/fnins.2016.00524] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 10/28/2016] [Indexed: 12/02/2022] Open
Abstract
Auditory scene analysis (ASA) refers to the process (es) of parsing the complex acoustic input into auditory perceptual objects representing either physical sources or temporal sound patterns, such as melodies, which contributed to the sound waves reaching the ears. A number of new computational models accounting for some of the perceptual phenomena of ASA have been published recently. Here we provide a theoretically motivated review of these computational models, aiming to relate their guiding principles to the central issues of the theoretical framework of ASA. Specifically, we ask how they achieve the grouping and separation of sound elements and whether they implement some form of competition between alternative interpretations of the sound input. We consider the extent to which they include predictive processes, as important current theories suggest that perception is inherently predictive, and also how they have been evaluated. We conclude that current computational models of ASA are fragmentary in the sense that rather than providing general competing interpretations of ASA, they focus on assessing the utility of specific processes (or algorithms) for finding the causes of the complex acoustic signal. This leaves open the possibility for integrating complementary aspects of the models into a more comprehensive theory of ASA.
Collapse
Affiliation(s)
- Beáta T Szabó
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic UniversityBudapest, Hungary; Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of SciencesBudapest, Hungary
| | - Susan L Denham
- School of Psychology, University of Plymouth Plymouth, UK
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences Budapest, Hungary
| |
Collapse
|
22
|
Brosowsky NP, Mondor TA. Multistable perception of ambiguous melodies and the role of musical expertise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:866. [PMID: 27586718 DOI: 10.1121/1.4960450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Whereas visual demonstrations of multistability are ubiquitous, there are few auditory examples. The purpose of the current study was to determine whether simultaneously presented melodies, such as underlie the scale illusion [Deutsch (1975). J. Acoust. Soc. Am. 57(5), 1156-1160], can elicit multiple mutually exclusive percepts, and whether reported perceptions are mediated by musical expertise. Participants listened to target melodies and reported whether the target was embedded in subsequent test melodies. Target sequences were created such that they would only be heard if the listener interpreted the test melody according to various perceptual cues. Critically, and in contrast with previous examinations of the scale illusion, an objective measure of target detection was obtained by including target-absent test melodies. As a result, listeners could reliably identify target sequences from different perceptual organizations when presented with the same test melody on different trials. This result demonstrates an ability to alternate between mutually exclusive percepts of an unchanged stimulus. However, only perceptual organizations consistent with frequency and spatial cues were available and musical expertise did mediate target detection, limiting the organizations available to non-musicians. The current study provides the first known demonstration of auditory multistability using simultaneously presented melodies and provides a unique experimental method for measuring auditory perceptual competition.
Collapse
Affiliation(s)
- Nicholaus P Brosowsky
- Department of Psychology, The Graduate Center of the City University of New York, 365 5th Avenue, New York, New York 10016, USA
| | - Todd A Mondor
- University of Manitoba, Winnipeg, Manitoba, R3T 2N2, Canada
| |
Collapse
|
23
|
Zimmermann JF, Moscovitch M, Alain C. Attending to auditory memory. Brain Res 2015; 1640:208-21. [PMID: 26638836 DOI: 10.1016/j.brainres.2015.11.032] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Revised: 11/18/2015] [Accepted: 11/19/2015] [Indexed: 10/22/2022]
Abstract
Attention to memory describes the process of attending to memory traces when the object is no longer present. It has been studied primarily for representations of visual stimuli with only few studies examining attention to sound object representations in short-term memory. Here, we review the interplay of attention and auditory memory with an emphasis on 1) attending to auditory memory in the absence of related external stimuli (i.e., reflective attention) and 2) effects of existing memory on guiding attention. Attention to auditory memory is discussed in the context of change deafness, and we argue that failures to detect changes in our auditory environments are most likely the result of a faulty comparison system of incoming and stored information. Also, objects are the primary building blocks of auditory attention, but attention can also be directed to individual features (e.g., pitch). We review short-term and long-term memory guided modulation of attention based on characteristic features, location, and/or semantic properties of auditory objects, and propose that auditory attention to memory pathways emerge after sensory memory. A neural model for auditory attention to memory is developed, which comprises two separate pathways in the parietal cortex, one involved in attention to higher-order features and the other involved in attention to sensory information. This article is part of a Special Issue entitled SI: Auditory working memory.
Collapse
Affiliation(s)
- Jacqueline F Zimmermann
- University of Toronto, Department of Psychology, Sidney Smith Hall, 100 St. George Street, Toronto, Ontario, Canada M5S 3G3; Rotman Research Institute, Baycrest Hospital, 3560 Bathurst Street, Toronto, Ontario, Canada M6A 2E1.
| | - Morris Moscovitch
- University of Toronto, Department of Psychology, Sidney Smith Hall, 100 St. George Street, Toronto, Ontario, Canada M5S 3G3; Rotman Research Institute, Baycrest Hospital, 3560 Bathurst Street, Toronto, Ontario, Canada M6A 2E1
| | - Claude Alain
- University of Toronto, Department of Psychology, Sidney Smith Hall, 100 St. George Street, Toronto, Ontario, Canada M5S 3G3; Rotman Research Institute, Baycrest Hospital, 3560 Bathurst Street, Toronto, Ontario, Canada M6A 2E1; Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
24
|
Masutomi K, Barascud N, Kashino M, McDermott JH, Chait M. Sound segregation via embedded repetition is robust to inattention. J Exp Psychol Hum Percept Perform 2015; 42:386-400. [PMID: 26480248 PMCID: PMC4763252 DOI: 10.1037/xhp0000147] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The segregation of sound sources from the mixture of sounds that enters the ear is a core capacity of human hearing, but the extent to which this process is dependent on attention remains unclear. This study investigated the effect of attention on the ability to segregate sounds via repetition. We utilized a dual task design in which stimuli to be segregated were presented along with stimuli for a "decoy" task that required continuous monitoring. The task to assess segregation presented a target sound 10 times in a row, each time concurrent with a different distractor sound. McDermott, Wrobleski, and Oxenham (2011) demonstrated that repetition causes the target sound to be segregated from the distractors. Segregation was queried by asking listeners whether a subsequent probe sound was identical to the target. A control task presented similar stimuli but probed discrimination without engaging segregation processes. We present results from 3 different decoy tasks: a visual multiple object tracking task, a rapid serial visual presentation (RSVP) digit encoding task, and a demanding auditory monitoring task. Load was manipulated by using high- and low-demand versions of each decoy task. The data provide converging evidence of a small effect of attention that is nonspecific, in that it affected the segregation and control tasks to a similar extent. In all cases, segregation performance remained high despite the presence of a concurrent, objectively demanding decoy task. The results suggest that repetition-based segregation is robust to inattention.
Collapse
Affiliation(s)
- Keiko Masutomi
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
| | | | - Makio Kashino
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | | |
Collapse
|
25
|
Liu AS, Tsunada J, Gold JI, Cohen YE. Temporal Integration of Auditory Information Is Invariant to Temporal Grouping Cues. eNeuro 2015; 2:ENEURO.0077-14.2015. [PMID: 26464975 PMCID: PMC4596088 DOI: 10.1523/eneuro.0077-14.2015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2014] [Revised: 03/01/2015] [Accepted: 03/30/2015] [Indexed: 11/29/2022] Open
Abstract
Auditory perception depends on the temporal structure of incoming acoustic stimuli. Here, we examined whether a temporal manipulation that affects the perceptual grouping also affects the time dependence of decisions regarding those stimuli. We designed a novel discrimination task that required human listeners to decide whether a sequence of tone bursts was increasing or decreasing in frequency. We manipulated temporal perceptual-grouping cues by changing the time interval between the tone bursts, which led to listeners hearing the sequences as a single sound for short intervals or discrete sounds for longer intervals. Despite these strong perceptual differences, this manipulation did not affect the efficiency of how auditory information was integrated over time to form a decision. Instead, the grouping manipulation affected subjects' speed-accuracy trade-offs. These results indicate that the temporal dynamics of evidence accumulation for auditory perceptual decisions can be invariant to manipulations that affect the perceptual grouping of the evidence.
Collapse
Affiliation(s)
| | - Joji Tsunada
- Department of Otorhinolaryngology, Perelman School of Medicine
| | - Joshua I. Gold
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Yale E. Cohen
- Department of Otorhinolaryngology, Perelman School of Medicine
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| |
Collapse
|
26
|
Bendixen A. Predictability effects in auditory scene analysis: a review. Front Neurosci 2014; 8:60. [PMID: 24744695 PMCID: PMC3978260 DOI: 10.3389/fnins.2014.00060] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Accepted: 03/14/2014] [Indexed: 12/02/2022] Open
Abstract
Many sound sources emit signals in a predictable manner. The idea that predictability can be exploited to support the segregation of one source's signal emissions from the overlapping signals of other sources has been expressed for a long time. Yet experimental evidence for a strong role of predictability within auditory scene analysis (ASA) has been scarce. Recently, there has been an upsurge in experimental and theoretical work on this topic resulting from fundamental changes in our perspective on how the brain extracts predictability from series of sensory events. Based on effortless predictive processing in the auditory system, it becomes more plausible that predictability would be available as a cue for sound source decomposition. In the present contribution, empirical evidence for such a role of predictability in ASA will be reviewed. It will be shown that predictability affects ASA both when it is present in the sound source of interest (perceptual foreground) and when it is present in other sound sources that the listener wishes to ignore (perceptual background). First evidence pointing toward age-related impairments in the latter capacity will be addressed. Moreover, it will be illustrated how effects of predictability can be shown by means of objective listening tests as well as by subjective report procedures, with the latter approach typically exploiting the multi-stable nature of auditory perception. Critical aspects of study design will be delineated to ensure that predictability effects can be unambiguously interpreted. Possible mechanisms for a functional role of predictability within ASA will be discussed, and an analogy with the old-plus-new heuristic for grouping simultaneous acoustic signals will be suggested.
Collapse
Affiliation(s)
- Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Cluster of Excellence "Hearing4all," European Medical School, Carl von Ossietzky University of Oldenburg Oldenburg, Germany
| |
Collapse
|
27
|
Szalárdy O, Bendixen A, Böhm TM, Davies LA, Denham SL, Winkler I. The effects of rhythm and melody on auditory stream segregation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:1392-1405. [PMID: 24606277 DOI: 10.1121/1.4865196] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
While many studies have assessed the efficacy of similarity-based cues for auditory stream segregation, much less is known about whether and how the larger-scale structure of sound sequences support stream formation and the choice of sound organization. Two experiments investigated the effects of musical melody and rhythm on the segregation of two interleaved tone sequences. The two sets of tones fully overlapped in pitch range but differed from each other in interaural time and intensity. Unbeknownst to the listener, separately, each of the interleaved sequences was created from the notes of a different song. In different experimental conditions, the notes and/or their timing could either follow those of the songs or they could be scrambled or, in case of timing, set to be isochronous. Listeners were asked to continuously report whether they heard a single coherent sequence (integrated) or two concurrent streams (segregated). Although temporal overlap between tones from the two streams proved to be the strongest cue for stream segregation, significant effects of tonality and familiarity with the songs were also observed. These results suggest that the regular temporal patterns are utilized as cues in auditory stream segregation and that long-term memory is involved in this process.
Collapse
Affiliation(s)
- Orsolya Szalárdy
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, P.O. Box 286, H-1519 Budapest, Hungary
| | - Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Cluster of Excellence "Hearing4all," European Medical School, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstrasse 114-118, D-26129 Oldenburg, Germany
| | - Tamás M Böhm
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, P.O. Box 286, H-1519 Budapest, Hungary
| | - Lucy A Davies
- Cognition Institute and School of Psychology, University of Plymouth, Drake Circus, Plymouth PL4 8AA, United Kingdom
| | - Susan L Denham
- Cognition Institute and School of Psychology, University of Plymouth, Drake Circus, Plymouth PL4 8AA, United Kingdom
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, P.O. Box 286, H-1519 Budapest, Hungary
| |
Collapse
|
28
|
[Influence of auditory object formation by multimodal interaction]. HNO 2012; 61:202-10. [PMID: 23241857 DOI: 10.1007/s00106-012-2524-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
BACKGROUND The task of assigning concurrent sounds to different auditory objects is known to depend on temporal and spectral cues. When tones of high and low frequencies are presented in alternation, they can be perceived as a single (integrated) melody, or as two parallel (segregated) melodic lines, according to the presentation rate and frequency distance between the sounds. At an intermediate distance or stimulation rate, the percept is ambiguous and alternates between segregated and integrated. This work studied whether an ambiguous sound organization could be modulated towards a robust integrated or a segregated percept by the synchronous presentation of visual cues. METHODS Two interleaved sets of sounds, one high frequency and one low frequency set were presented with concurrent visual stimuli, synchronized to either a within-set frequency pattern or to the across-set intensity pattern. Elicitation of the mismatch negativity (MMN) component of event-related brain potentials served as indices for the segregated organization, when no task was performed with the sounds. As a result, MMN was elicited only when the visual pattern promoted the segregation of the sounds. RESULTS By spatial analysis of the distribution of electromagnetic potentials, four separated neuronal sources underlying the obtained MMN response were identified. One pair was located bilaterally in temporal cortical structures and another pair in occipital areas, representing the auditory and visual origin of the MMN response, evoked by inverted triplets as used in this study. Thus, the results demonstrate cross-modal effects of visual information on auditory object perception.
Collapse
|
29
|
Abstract
A sequence of sounds may be heard as coming from a single source (called fusion or coherence) or from two or more sources (called fission or stream segregation). Each perceived source is called a 'stream'. When the differences between successive sounds are very large, fission nearly always occurs, whereas when the differences are very small, fusion nearly always occurs. When the differences are intermediate in size, the percept often 'flips' between one stream and multiple streams, a property called 'bistability'. The flips do not generally occur regularly in time. The tendency to hear two streams builds up over time, but can be partially or completely reset by a sudden change in the properties of the sequence or by switches in attention. Stream formation depends partly on the extent to which successive sounds excite different 'channels' in the peripheral auditory system. However, other factors can play a strong role; multiple streams may be heard when successive sounds are presented to the same ear and have essentially identical excitation patterns in the cochlea. Differences between successive sounds in temporal envelope, fundamental frequency, phase spectrum and lateralization can all induce a percept of multiple streams. Regularities in the temporal pattern of elements within a stream can help in stabilizing that stream.
Collapse
Affiliation(s)
- Brian C J Moore
- Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK.
| | | |
Collapse
|
30
|
Snyder JS, Gregg MK, Weintraub DM, Alain C. Attention, awareness, and the perception of auditory scenes. Front Psychol 2012; 3:15. [PMID: 22347201 PMCID: PMC3273855 DOI: 10.3389/fpsyg.2012.00015] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 01/11/2012] [Indexed: 11/25/2022] Open
Abstract
Auditory perception and cognition entails both low-level and high-level processes, which are likely to interact with each other to create our rich conscious experience of soundscapes. Recent research that we review has revealed numerous influences of high-level factors, such as attention, intention, and prior experience, on conscious auditory perception. And recently, studies have shown that auditory scene analysis tasks can exhibit multistability in a manner very similar to ambiguous visual stimuli, presenting a unique opportunity to study neural correlates of auditory awareness and the extent to which mechanisms of perception are shared across sensory modalities. Research has also led to a growing number of techniques through which auditory perception can be manipulated and even completely suppressed. Such findings have important consequences for our understanding of the mechanisms of perception and also should allow scientists to precisely distinguish the influences of different higher-level influences.
Collapse
Affiliation(s)
- Joel S. Snyder
- Department of Psychology, University of Nevada Las VegasLas Vegas, NV, USA
| | - Melissa K. Gregg
- Department of Psychology, University of Nevada Las VegasLas Vegas, NV, USA
| | - David M. Weintraub
- Department of Psychology, University of Nevada Las VegasLas Vegas, NV, USA
| | - Claude Alain
- The Rotman Research Institute, Baycrest Centre for Geriatric CareToronto, ON, Canada
| |
Collapse
|
31
|
Pressnitzer D, Suied C, Shamma SA. Auditory scene analysis: the sweet music of ambiguity. Front Hum Neurosci 2011; 5:158. [PMID: 22174701 PMCID: PMC3237025 DOI: 10.3389/fnhum.2011.00158] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2011] [Accepted: 11/16/2011] [Indexed: 12/02/2022] Open
Abstract
In this review paper aimed at the non-specialist, we explore the use that neuroscientists and musicians have made of perceptual illusions based on ambiguity. The pivotal issue is auditory scene analysis (ASA), or what enables us to make sense of complex acoustic mixtures in order to follow, for instance, a single melody in the midst of an orchestra. In general, ASA uncovers the most likely physical causes that account for the waveform collected at the ears. However, the acoustical problem is ill-posed and it must be solved from noisy sensory input. Recently, the neural mechanisms implicated in the transformation of ambiguous sensory information into coherent auditory scenes have been investigated using so-called bistability illusions (where an unchanging ambiguous stimulus evokes a succession of distinct percepts in the mind of the listener). After reviewing some of those studies, we turn to music, which arguably provides some of the most complex acoustic scenes that a human listener will ever encounter. Interestingly, musicians will not always aim at making each physical source intelligible, but rather express one or more melodic lines with a small or large number of instruments. By means of a few musical illustrations and by using a computational model inspired by neuro-physiological principles, we suggest that this relies on a detailed (if perhaps implicit) knowledge of the rules of ASA and of its inherent ambiguity. We then put forward the opinion that some degree perceptual ambiguity may participate in our appreciation of music.
Collapse
Affiliation(s)
- Daniel Pressnitzer
- Centre National de la Recherche Scientifique and Université Paris Descartes, UMR 8158 Paris, France
| | | | | |
Collapse
|
32
|
Lotto A, Holt L. Psychology of auditory perception. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2011; 2:479-489. [PMID: 26302301 DOI: 10.1002/wcs.123] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Audition is often treated as a 'secondary' sensory system behind vision in the study of cognitive science. In this review, we focus on three seemingly simple perceptual tasks to demonstrate the complexity of perceptual-cognitive processing involved in everyday audition. After providing a short overview of the characteristics of sound and their neural encoding, we present a description of the perceptual task of segregating multiple sound events that are mixed together in the signal reaching the ears. Then, we discuss the ability to localize the sound source in the environment. Finally, we provide some data and theory on how listeners categorize complex sounds, such as speech. In particular, we present research on how listeners weigh multiple acoustic cues in making a categorization decision. One conclusion of this review is that it is time for auditory cognitive science to be developed to match what has been done in vision in order for us to better understand how humans communicate with speech and music. WIREs Cogni Sci 2011 2 479-489 DOI: 10.1002/wcs.123 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Andrew Lotto
- Department of Speech, Language, and Hearing Sciences, Tucson, AZ, USA
| | - Lori Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
33
|
Andreou LV, Kashino M, Chait M. The role of temporal regularity in auditory segregation. Hear Res 2011; 280:228-35. [PMID: 21683778 DOI: 10.1016/j.heares.2011.06.001] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/11/2011] [Revised: 05/25/2011] [Accepted: 06/01/2011] [Indexed: 11/17/2022]
Abstract
The idea that predictive modelling and extraction of regularities plays a pivotal role in auditory segregation has recently attracted considerable attention. The present study investigated the effect of one basic form of regularity, rhythmic regularity, on auditory stream segregation. We departed from the classic streaming paradigm and developed a new stimulus, Rand-AB, consisting of two, concurrently presented, temporally uncorrelated, tone sequences (with frequencies A and B). To evaluate segregation, we used an objective measure of the extent to which listeners are able to selectively attend to one of the sequences in the presence of the other. Performance was quantified on a difficult pattern detection task which involves detecting a rarely occurring pattern of amplitude modulation applied to three consecutive A or B tones. In all cases the attended sequence was temporally irregular (with a random inter-tone-interval (ITI) between 100 and 400 ms) and the regularity status of the competing sequence was set to one of four conditions: (1) random ITI between 100 and 400 ms (2) isochronous with ITI = 400 ms. (3) isochronous with ITI = 250 ms (equal to the mean rate of the attended sequence) (4) isochronous with ITI = 100 ms. For a frequency separation of 2 (but not 4) semi tones we observed improved performance in conditions (3) and (4) relative to (1), suggesting that stream segregation is facilitated when the distracter sequence is temporally regular, but that the effect of temporal regularity as a cue for segregation is limited to relatively fast rates and to situations where frequency separation is insufficient for segregation. These findings provide new evidence to support models of streaming that involve segregation based on the formation of predictive models.
Collapse
|
34
|
Russell AM, Boakes RA. Identification of confusable odours including wines: Appropriate labels enhance performance. Food Qual Prefer 2011. [DOI: 10.1016/j.foodqual.2010.11.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
35
|
Ma L, Micheyl C, Yin P, Oxenham AJ, Shamma SA. Behavioral measures of auditory streaming in ferrets (Mustela putorius). ACTA ACUST UNITED AC 2011; 124:317-30. [PMID: 20695663 DOI: 10.1037/a0018273] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
An important aspect of the analysis of auditory "scenes" relates to the perceptual organization of sound sequences into auditory "streams." In this study, we adapted two auditory perception tasks, used in recent human psychophysical studies, to obtain behavioral measures of auditory streaming in ferrets (Mustela putorius). One task involved the detection of shifts in the frequency of tones within an alternating tone sequence. The other task involved the detection of a stream of regularly repeating target tones embedded within a randomly varying multitone background. In both tasks, performance was measured as a function of various stimulus parameters, which previous psychophysical studies in humans have shown to influence auditory streaming. Ferret performance in the two tasks was found to vary as a function of these parameters in a way that is qualitatively consistent with the human data. These results suggest that auditory streaming occurs in ferrets, and that the two tasks described here may provide a valuable tool in future behavioral and neurophysiological studies of the phenomenon.
Collapse
Affiliation(s)
- Ling Ma
- Neural Systems Laboratory, Department of Bioengineering, University of Maryland, College Park, MD 20742, USA.
| | | | | | | | | |
Collapse
|
36
|
Objective and subjective psychophysical measures of auditory stream integration and segregation. J Assoc Res Otolaryngol 2010; 11:709-24. [PMID: 20658165 DOI: 10.1007/s10162-010-0227-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2009] [Accepted: 06/30/2010] [Indexed: 10/19/2022] Open
Abstract
The perceptual organization of sound sequences into auditory streams involves the integration of sounds into one stream and the segregation of sounds into separate streams. "Objective" psychophysical measures of auditory streaming can be obtained using behavioral tasks where performance is facilitated by segregation and hampered by integration, or vice versa. Traditionally, these two types of tasks have been tested in separate studies involving different listeners, procedures, and stimuli. Here, we tested subjects in two complementary temporal-gap discrimination tasks involving similar stimuli and procedures. One task was designed so that performance in it would be facilitated by perceptual integration; the other, so that performance would be facilitated by perceptual segregation. Thresholds were measured in both tasks under a wide range of conditions produced by varying three stimulus parameters known to influence stream formation: frequency separation, tone-presentation rate, and sequence length. In addition to these performance-based measures, subjective judgments of perceived segregation were collected in the same listeners under corresponding stimulus conditions. The patterns of results obtained in the two temporal-discrimination tasks, and the relationships between thresholds and perceived-segregation judgments, were mostly consistent with the hypothesis that stream segregation helped performance in one task and impaired performance in the other task. The tasks and stimuli described here may prove useful in future behavioral or neurophysiological experiments, which seek to manipulate and measure neural correlates of auditory streaming while minimizing differences between the physical stimuli.
Collapse
|
37
|
Devergie A, Grimault N, Tillmann B, Berthommier F. Effect of rhythmic attention on the segregation of interleaved melodies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:EL1-EL7. [PMID: 20649182 DOI: 10.1121/1.3436498] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
As previously suggested, attention may increase segregation via enhancement and suppression sensory mechanisms. To test this hypothesis, we proposed an interleaved melody paradigm with two rhythm conditions applied to familiar target melodies and unfamiliar distractor melodies sharing pitch and timbre properties. When rhythms of both target and distractor were irregular, target melodies were identified above chance level. A sensory enhancement mechanism guided by listeners' knowledge may have helped to extract targets from the interleaved sequence. When the distractor was rhythmically regular, performance was increased, suggesting that the distractor may have been suppressed by a sensory suppression mechanism.
Collapse
Affiliation(s)
- Aymeric Devergie
- Laboratoire de Neurosciences Sensorielles, Comportement et Cognition, CNRS UMR 5020, Université Lyon 1, 69366 Lyon Cedex 07, France.
| | | | | | | |
Collapse
|
38
|
Cooper HR, Roberts B. Auditory stream segregation in cochlear implant listeners: measures based on temporal discrimination and interleaved melody recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:1975-1987. [PMID: 19813809 DOI: 10.1121/1.3203210] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The evidence that cochlear implant listeners routinely experience stream segregation is limited and equivocal. Streaming in these listeners was explored using tone sequences matched to the center frequencies of the implant's 22 electrodes. Experiment 1 measured temporal discrimination for short (ABA triplet) and longer (12 AB cycles) sequences (tone/silence durations = 60/40 ms). Tone A stimulated electrode 11; tone B stimulated one of 14 electrodes. On each trial, one sequence remained isochronous, and tone B was delayed in the other; listeners had to identify the anisochronous interval. The delay was introduced in the second half of the longer sequences. Prior build-up of streaming should cause thresholds to rise more steeply with increasing electrode separation, but no interaction with sequence length was found. Experiment 2 required listeners to identify which of two target sequences was present when interleaved with distractors (tone/silence durations = 120/80 ms). Accuracy was high for isolated targets, but most listeners performed near chance when loudness-matched distractors were added, even when remote from the target. Only a substantial reduction in distractor level improved performance, and this effect did not interact with target-distractor separation. These results indicate that implantees often do not achieve stream segregation, even in relatively unchallenging tasks.
Collapse
Affiliation(s)
- Huw R Cooper
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | | |
Collapse
|
39
|
Abstract
Abstract
The ability to segregate simultaneously occurring sounds is fundamental to auditory perception. Many studies have shown that musicians have enhanced auditory perceptual abilities; however, the impact of musical expertise on segregating concurrently occurring sounds is unknown. Therefore, we examined whether long-term musical training can improve listeners' ability to segregate sounds that occur simultaneously. Participants were presented with complex sounds that had either all harmonics in tune or the second harmonic mistuned by 1%, 2%, 4%, 8%, or 16% of its original value. The likelihood of hearing two sounds simultaneously increased with mistuning, and this effect was greater in musicians than nonmusicians. The segregation of the mistuned harmonic from the harmonic series was paralleled by an object-related negativity that was larger and peaked earlier in musicians. It also coincided with a late positive wave referred to as the P400 whose amplitude was larger in musicians than in nonmusicians. The behavioral and electrophysiological effects of musical expertise were specific to processing the mistuned harmonic as the N1, the N1c, and the P2 waves elicited by the tuned stimuli were comparable in both musicians and nonmusicians. These results demonstrate that listeners' ability to segregate concurrent sounds based on harmonicity is modulated by experience and provides a basis for further studies assessing the potential rehabilitative effects of musical training on solving complex scene analysis problems illustrated by the cocktail party example.
Collapse
|
40
|
Bee MA, Micheyl C. The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? J Comp Psychol 2008; 122:235-51. [PMID: 18729652 PMCID: PMC2692487 DOI: 10.1037/0735-7036.122.3.235] [Citation(s) in RCA: 195] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Animals often use acoustic signals to communicate in groups or social aggregations in which multiple individuals signal within a receiver's hearing range. Consequently, receivers face challenges related to acoustic interference and auditory masking that are not unlike the human cocktail party problem, which refers to the problem of perceiving speech in noisy social settings. Understanding the sensory solutions to the cocktail party problem has been a goal of research on human hearing and speech communication for several decades. Despite a general interest in acoustic signaling in groups, animal behaviorists have devoted comparatively less attention toward understanding how animals solve problems equivalent to the human cocktail party problem. After illustrating how humans and nonhuman animals experience and overcome similar perceptual challenges in cocktail-party-like social environments, this article reviews previous psychophysical and physiological studies of humans and nonhuman animals to describe how the cocktail party problem can be solved. This review also outlines several basic and applied benefits that could result from studies of the cocktail party problem in the context of animal acoustic communication.
Collapse
Affiliation(s)
- Mark A Bee
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN 55108, USA.
| | | |
Collapse
|
41
|
Searchfield GD, Morrison-Low J, Wise K. Object identification and attention training for treating tinnitus. PROGRESS IN BRAIN RESEARCH 2007; 166:441-60. [PMID: 17956809 DOI: 10.1016/s0079-6123(07)66043-9] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We hypothesize that abnormal attention and auditory scene analysis contribute to the severity of tinnitus and that the incongruence between tinnitus and normal auditory perception is responsible for its resistance to traditional sound-based habituation therapies. New methods of treatment using auditory and visual attention training are proposed as a means to augment counseling and sound therapies for tinnitus management. Attention training has been demonstrated to improve an individuals' ability to attend to relevant sounds while ignoring distracters. The main aim of the current study was to determine the effectiveness of structured Auditory Object Identification and Localization (AOIL) tasks to train persons to ignore their tinnitus. The study looked at the effects of a 15-day (30 min/day) take-home auditory training program on individuals with severe tinnitus. Pitch-matched tinnitus loudness levels (TLLs), tinnitus minimum masking levels (MMLs) and measures of attention were compared before and after the auditory training. The results of this study suggest that short-duration auditory training which actively engages attention, object identification and which requires a response from participants, reduces tinnitus. There was a greater effect on pitch-matched tinnitus MMLs than on actual TLLs. The reason(s) for this are unclear, although a correlation found between changes in MMLs and improvements in the ability to shift attention may be one underlying reason. Although this study followed a small number of participants over a limited time-span, it is believed that the training and accompanying model are a promising approach to investigate and treat some forms of tinnitus.
Collapse
Affiliation(s)
- Grant D Searchfield
- Section of Audiology, School of Population Health, Faculty of Medical and Health Sciences, The University of Auckland, Auckland, New Zealand.
| | | | | |
Collapse
|
42
|
Abstract
Sounds provide us with useful information about our environment which complements that provided by other senses, but also poses specific processing problems. How does the auditory system distentangle sounds from different sound sources? And what is it that allows intermittent sound events from the same source to be associated with each other? Here we review findings from a wide range of studies using the auditory streaming paradigm in order to formulate a unified account of the processes underlying auditory perceptual organization. We present new computational modelling results which replicate responses in primary auditory cortex [Fishman, Y.I., Arezzo, J.C., Steinschneider, M., 2004. Auditory stream segregation in monkey auditory cortex: effects of frequency separation, presentation rate, and tone duration. J. Acoust. Soc. Am. 116, 1656-1670; Fishman, Y. I., Reser, D. H., Arezzo, J.C., Steinschneider, M., 2001. Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear. Res. 151, 167-187] to tone sequences. We also present the results of a perceptual experiment which confirm the bi-stable nature of auditory streaming, and the proposal that the gradual build-up of streaming may be an artefact of averaging across many subjects [Pressnitzer, D., Hupé, J. M., 2006. Temporal dynamics of auditory and visual bi-stability reveal common principles of perceptual organization. Curr. Biol. 16(13), 1351-1357.]. Finally we argue that in order to account for all of the experimental findings, computational models of auditory stream segregation require four basic processing elements; segregation, predictive modelling, competition and adaptation, and that it is the formation of effective predictive models which allows the system to keep track of different sound sources in a complex auditory environment.
Collapse
Affiliation(s)
- S L Denham
- Centre for Theoretical and Computational Neuroscience, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK.
| | | |
Collapse
|
43
|
Fishman YI, Arezzo JC, Steinschneider M. Auditory stream segregation in monkey auditory cortex: effects of frequency separation, presentation rate, and tone duration. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2004; 116:1656-1670. [PMID: 15478432 DOI: 10.1121/1.1778903] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Auditory stream segregation refers to the organization of sequential sounds into "perceptual streams" reflecting individual environmental sound sources. In the present study, sequences of alternating high and low tones, "...ABAB...," similar to those used in psychoacoustic experiments on stream segregation, were presented to awake monkeys while neural activity was recorded in primary auditory cortex (A1). Tone frequency separation (AF), tone presentation rate (PR), and tone duration (TD) were systematically varied to examine whether neural responses correlate with effects of these variables on perceptual stream segregation. "A" tones were fixed at the best frequency of the recording site, while "B" tones were displaced in frequency from "A" tones by an amount = delta F. As PR increased, "B" tone responses decreased in amplitude to a greater extent than "A" tone responses, yielding neural response patterns dominated by "A" tone responses occurring at half the alternation rate. Increasing TD facilitated the differential attenuation of "B" tone responses. These findings parallel psychoacoustic data and suggest a physiological model of stream segregation whereby increasing delta F, PR, or TD enhances spatial differentiation of "A" tone and "B" tone responses along the tonotopic map in A1.
Collapse
Affiliation(s)
- Yonatan I Fishman
- Department of Neurology, Albert Einstein College of Medicine, Kennedy Center, Bronx, New York 10461, USA.
| | | | | |
Collapse
|
44
|
Abstract
An fMRI study of interleaved melody recognition was conducted to examine the neural basis of the bottom-up and top-down mechanisms involved in auditory stream segregation. Hemodynamic activity generated by a mixed sequence was recorded in eight listeners who were asked to recognize a target melody interleaved with distractor tones when the target was presented either before or after the composite sequence. fMRI results suggest that similar cortical networks were involved in both conditions, including bilaterally the auditory cortices within the superior temporal gyrus as well as the thalamus and the inferior frontal gyrus. However, when listeners heard the melody before they had to extract it from the mixture, neural activation in the inferior frontal operculum was significantly enhanced bilaterally; no change in auditory cortical activity was detected.
Collapse
Affiliation(s)
- Caroline E Bey
- Montreal Neurological Institute, McGill University, Montreal, Quebec H3A2B4, Canada.
| | | |
Collapse
|
45
|
Bey C, McAdams S. Postrecognition of interleaved melodies as an indirect measure of auditory stream formation. J Exp Psychol Hum Percept Perform 2003; 29:267-79. [PMID: 12760614 DOI: 10.1037/0096-1523.29.2.267] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Primitive processes involved in auditory stream formation are measured with indirect, objective method. A target melody interleaved with a distractor sequence is followed by a probe melody that was identical to the target or differed by 2 notes. Listeners decided whether the probe melody was present or not in the composite sequence. Interleaved melody recognition is not possible when distractor sequences have the same mean frequency and maximum contour crossover with target melodies. Performance increases with mean frequency separation and timbral dissimilarity and is unaffected by the duration of the silent interval between composite sequence and probe melody. The relation between this indirect task measuring the interleaved melody recognition boundary and direct judgments measuring the fission boundary is discussed.
Collapse
Affiliation(s)
- Caroline Bey
- Laboratoire de Psychologie Expérimentale, Université René Descartes and Institut de Recherche et Coordination Acoustique/Musique-Centre National de la Recherche Scientifique, Paris, France
| | | |
Collapse
|