1
|
Symons A, Dick F, Tierney A. Salient sounds distort time perception and production. Psychon Bull Rev 2024; 31:137-147. [PMID: 37430179 PMCID: PMC10866776 DOI: 10.3758/s13423-023-02305-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/03/2023] [Indexed: 07/12/2023]
Abstract
The auditory world is often cacophonous, with some sounds capturing attention and distracting us from our goals. Despite the universality of this experience, many questions remain about how and why sound captures attention, how rapidly behavior is disrupted, and how long this interference lasts. Here, we use a novel measure of behavioral disruption to test predictions made by models of auditory salience. Models predict that goal-directed behavior is disrupted immediately after points in time that feature a high degree of spectrotemporal change. We find that behavioral disruption is precisely time-locked to the onset of distracting sound events: Participants who tap to a metronome temporarily increase their tapping speed 750 ms after the onset of distractors. Moreover, this response is greater for more salient sounds (larger amplitude) and sound changes (greater pitch shift). We find that the time course of behavioral disruption is highly similar after acoustically disparate sound events: Both sound onsets and pitch shifts of continuous background sounds speed responses at 750 ms, with these effects dying out by 1,750 ms. These temporal distortions can be observed using only data from the first trial across participants. A potential mechanism underlying these results is that arousal increases after distracting sound events, leading to an expansion of time perception, and causing participants to misjudge when their next movement should begin.
Collapse
Affiliation(s)
- Ashley Symons
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK
| | - Fred Dick
- Experimental Psychology, Division of Psychology and Language Sciences, University College London, London, UK.
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK
| |
Collapse
|
2
|
Kothinti SR, Elhilali M. Are acoustics enough? Semantic effects on auditory salience in natural scenes. Front Psychol 2023; 14:1276237. [PMID: 38098516 PMCID: PMC10720592 DOI: 10.3389/fpsyg.2023.1276237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/10/2023] [Indexed: 12/17/2023] Open
Abstract
Auditory salience is a fundamental property of a sound that allows it to grab a listener's attention regardless of their attentional state or behavioral goals. While previous research has shed light on acoustic factors influencing auditory salience, the semantic dimensions of this phenomenon have remained relatively unexplored owing both to the complexity of measuring salience in audition as well as limited focus on complex natural scenes. In this study, we examine the relationship between acoustic, contextual, and semantic attributes and their impact on the auditory salience of natural audio scenes using a dichotic listening paradigm. The experiments present acoustic scenes in forward and backward directions; the latter allows to diminish semantic effects, providing a counterpoint to the effects observed in forward scenes. The behavioral data collected from a crowd-sourced platform reveal a striking convergence in temporal salience maps for certain sound events, while marked disparities emerge in others. Our main hypothesis posits that differences in the perceptual salience of events are predominantly driven by semantic and contextual cues, particularly evident in those cases displaying substantial disparities between forward and backward presentations. Conversely, events exhibiting a high degree of alignment can largely be attributed to low-level acoustic attributes. To evaluate this hypothesis, we employ analytical techniques that combine rich low-level mappings from acoustic profiles with high-level embeddings extracted from a deep neural network. This integrated approach captures both acoustic and semantic attributes of acoustic scenes along with their temporal trajectories. The results demonstrate that perceptual salience is a careful interplay between low-level and high-level attributes that shapes which moments stand out in a natural soundscape. Furthermore, our findings underscore the important role of longer-term context as a critical component of auditory salience, enabling us to discern and adapt to temporal regularities within an acoustic scene. The experimental and model-based validation of semantic factors of salience paves the way for a complete understanding of auditory salience. Ultimately, the empirical and computational analyses have implications for developing large-scale models for auditory salience and audio analytics.
Collapse
Affiliation(s)
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
3
|
Bouvier B, Susini P, Marquis-Favre C, Misdariis N. Revealing the stimulus-driven component of attention through modulations of auditory salience by timbre attributes. Sci Rep 2023; 13:6842. [PMID: 37100849 PMCID: PMC10133446 DOI: 10.1038/s41598-023-33496-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 04/13/2023] [Indexed: 04/28/2023] Open
Abstract
Attention allows the listener to select relevant information from their environment, and disregard what is irrelevant. However, irrelevant stimuli sometimes manage to capture it and stand out from a scene because of bottom-up processes driven by salient stimuli. This attentional capture effect was observed using an implicit approach based on the additional singleton paradigm. In the auditory domain, it was shown that sound attributes such as intensity and frequency tend to capture attention during auditory search (cost to performance) for targets defined on a different dimension such as duration. In the present study, the authors examined whether a similar phenomenon occurs for attributes of timbre such as brightness (related to the spectral centroid) and roughness (related the amplitude modulation depth). More specifically, we revealed the relationship between the variations of these attributes and the magnitude of the attentional capture effect. In experiment 1, the occurrence of a brighter sound (higher spectral centroid) embedded in sequences of successive tones produced significant search costs. In experiments 2 and 3, different values of brightness and roughness confirmed that attention capture is monotonically driven by the sound features. In experiment 4, the effect was found to be symmetrical: positive or negative, the same difference in brightness had the same negative effect on performance. Experiment 5 suggested that the effect produced by the variations of the two attributes is additive. This work provides a methodology for quantifying the bottom-up component of attention and brings new insights on attention capture and auditory salience.
Collapse
Affiliation(s)
- Baptiste Bouvier
- STMS IRCAM, Sorbonne Université, CNRS, Ministère de La Culture, 75004, Paris, France.
- Univ Lyon, ENTPE, École Centrale de Lyon, CNRS, LTDS, UMR5513, 69518, Vaulx-en-Velin, France.
| | - Patrick Susini
- STMS IRCAM, Sorbonne Université, CNRS, Ministère de La Culture, 75004, Paris, France
| | - Catherine Marquis-Favre
- Univ Lyon, ENTPE, École Centrale de Lyon, CNRS, LTDS, UMR5513, 69518, Vaulx-en-Velin, France
| | - Nicolas Misdariis
- STMS IRCAM, Sorbonne Université, CNRS, Ministère de La Culture, 75004, Paris, France
| |
Collapse
|
4
|
Yihang D, Chao S, Ke N. Model study of target discrimination in concurrent auditory events. COGNITIVE COMPUTATION AND SYSTEMS 2022. [DOI: 10.1049/ccs2.12052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Du Yihang
- National Academy of Chinese Theater Arts New Media Arts Department Beijing China
- Beijing Institute of Technology School of Design and Arts Beijing China
| | - Sun Chao
- Beijing Institute of Technology School of Design and Arts Beijing China
| | - Niu Ke
- Collaborative Innovation Center for HSR Driver Health and Safety Zhengzhou Railway Vocational & Technical College Henan China
| |
Collapse
|
5
|
Castiajo P, Pinheiro AP. Attention to voices is increased in non-clinical auditory verbal hallucinations irrespective of salience. Neuropsychologia 2021; 162:108030. [PMID: 34563552 DOI: 10.1016/j.neuropsychologia.2021.108030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 09/17/2021] [Accepted: 09/20/2021] [Indexed: 11/24/2022]
Abstract
Alterations in the processing of vocal emotions have been associated with both clinical and non-clinical auditory verbal hallucinations (AVH), suggesting that changes in the mechanisms underpinning voice perception contribute to AVH. These alterations seem to be more pronounced in psychotic patients with AVH when attention demands increase. However, it remains to be clarified how attention modulates the processing of vocal emotions in individuals without clinical diagnoses who report hearing voices but no related distress. Using an active auditory oddball task, the current study clarified how emotion and attention interact during voice processing as a function of AVH proneness, and examined the contributions of stimulus valence and intensity. Participants with vs. without non-clinical AVH were presented with target vocalizations differing in valence (neutral; positive; negative) and intensity (55 decibels (dB); 75 dB). The P3b amplitude was larger in response to louder (vs. softer) vocal targets irrespective of valence, and in response to negative (vs. neutral) vocal targets irrespective of intensity. Of note, the P3b amplitude was globally increased in response to vocal targets in participants reporting AVH, and failed to be modulated by valence and intensity in these participants. These findings suggest enhanced voluntary attention to changes in vocal expressions but reduced discrimination of salient and non-salient cues. A decreased sensitivity to salience cues of vocalizations could contribute to increased cognitive control demands, setting the stage for an AVH.
Collapse
Affiliation(s)
- Paula Castiajo
- Psychological Neuroscience Laboratory, CIPsi, School of Psychology, University of Minho, Braga, Portugal
| | - Ana P Pinheiro
- Faculdade de Psicologia, CICPSI, Universidade de Lisboa, Lisboa, Portugal; Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands.
| |
Collapse
|
6
|
Soeta Y, Ariki A. Subjective Salience of Birdsong and Insect Song with Equal Sound Pressure Level and Loudness. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E8858. [PMID: 33260514 PMCID: PMC7731388 DOI: 10.3390/ijerph17238858] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 11/24/2020] [Accepted: 11/25/2020] [Indexed: 11/16/2022]
Abstract
Birdsong is used to communicate the position of stairwells to visually impaired people in train stations in Japan. However, more than 40% of visually impaired people reported that such sounds were difficult to identify. Train companies seek to present the sounds at a sound pressure level that is loud enough to be detected, but not so loud as to be annoying. Therefore, salient birdsongs with relatively low sound pressure levels are required. In the current study, we examined the salience of different types of birdsong and insect song, and determined the dominant physical parameters related to salience. We considered insect songs because both birdsongs and insect songs have been found to have positive effects on soundscapes. We evaluated subjective saliences of birdsongs and insect songs using paired comparison methods, and examined the relationships between subjective salience and physical parameters. In total, 62 participants evaluated 18 types of bird songs and 16 types of insect sounds. The results indicated that the following features significantly influenced subjective salience: the maximum peak amplitude of the autocorrelation function, which signifies pitch strength; the interaural cross-correlation coefficient, which signifies apparent source width; the amplitude fluctuation component; and spectral content, such as flux and skewness.
Collapse
Affiliation(s)
- Yoshiharu Soeta
- Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Osaka 563-8577, Japan;
| | | |
Collapse
|
7
|
Abstract
To ensure that listeners pay attention and do not habituate, emotionally intense vocalizations may be under evolutionary pressure to exploit processing biases in the auditory system by maximising their bottom-up salience. This "salience code" hypothesis was tested using 128 human nonverbal vocalizations representing eight emotions: amusement, anger, disgust, effort, fear, pain, pleasure, and sadness. As expected, within each emotion category salience ratings derived from pairwise comparisons strongly correlated with perceived emotion intensity. For example, while laughs as a class were less salient than screams of fear, salience scores almost perfectly explained the perceived intensity of both amusement and fear considered separately. Validating self-rated salience evaluations, high- vs. low-salience sounds caused 25% more recall errors in a short-term memory task, whereas emotion intensity had no independent effect on recall errors. Furthermore, the acoustic characteristics of salient vocalizations were similar to those previously described for non-emotional sounds (greater duration and intensity, high pitch, bright timbre, rapid modulations, and variable spectral characteristics), confirming that vocalizations were not salient merely because of their emotional content. The acoustic code in nonverbal communication is thus aligned with sensory biases, offering a general explanation for some non-arbitrary properties of human and animal high-arousal vocalizations.
Collapse
Affiliation(s)
- Andrey Anikin
- Division of Cognitive Science, Lund University, Lund, Sweden
| |
Collapse
|
8
|
Carlile S, Ciccarelli G, Cockburn J, Diedesch AC, Finnegan MK, Hafter E, Henin S, Kalluri S, Kell AJE, Ozmeral EJ, Roark CL, Sagers JE. Listening Into 2030 Workshop: An Experiment in Envisioning the Future of Hearing and Communication Science. Trends Hear 2019; 21:2331216517737684. [PMID: 29090640 PMCID: PMC5912269 DOI: 10.1177/2331216517737684] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Here we report the methods and output of a workshop examining possible futures of
speech and hearing science out to 2030. Using a design thinking approach, a
range of human-centered problems in communication were identified that could
provide the motivation for a wide range of research. Nine main research programs
were distilled and are summarized: (a) measuring brain and other physiological
parameters, (b) auditory and multimodal displays of information, (c) auditory
scene analysis, (d) enabling and understanding shared auditory virtual spaces,
(e) holistic approaches to health management and hearing impairment, (f)
universal access to evolving and individualized technologies, (g) biological
intervention for hearing dysfunction, (h) understanding the psychosocial
interactions with technology and other humans as mediated by technology, and (i)
the impact of changing models of security and privacy. The design thinking
approach attempted to link the judged level of importance of different research
areas to the “end in mind” through empathy for the real-life problems embodied
in the personas created during the workshop.
Collapse
Affiliation(s)
- Simon Carlile
- 1 35797 Starkey Hearing Technologies , Berkeley, CA, USA.,2 School of Medical Sciences, The University of Sydney, NSW, Australia
| | - Gregory Ciccarelli
- 3 2167 Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, USA
| | | | - Anna C Diedesch
- 5 Department of Otolaryngology/Head & Neck Surgery, Oregon Health & Science University, Portland, OR, USA.,6 National Center for Rehabilitative Auditory Research, Portland Veterans Affairs Medical Center, Portland, OR, USA
| | - Megan K Finnegan
- 7 Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, IL, USA
| | - Ervin Hafter
- 8 Department of Psychology, University of California, Berkeley, CA, USA
| | - Simon Henin
- 9 Department of Neurology, NYU School of Medicine, NY, USA
| | | | - Alexander J E Kell
- 10 Department of Brain and Cognitive Sciences, 2167 Massachusetts Institute of Technology , Cambridge, MA, USA
| | - Erol J Ozmeral
- 11 Department of Communication Sciences and Disorders, 7831 University of South Florida , Tampa, FL, USA
| | - Casey L Roark
- 12 Department of Psychology, 6612 Carnegie Mellon University , Center for the Neural Basis of Cognition, Pittsburgh, PA, USA
| | - Jessica E Sagers
- 13 Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
9
|
Huang N, Slaney M, Elhilali M. Connecting Deep Neural Networks to Physical, Perceptual, and Electrophysiological Auditory Signals. Front Neurosci 2018; 12:532. [PMID: 30154688 PMCID: PMC6102345 DOI: 10.3389/fnins.2018.00532] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2018] [Accepted: 07/16/2018] [Indexed: 11/13/2022] Open
Abstract
Deep neural networks have been recently shown to capture intricate information transformation of signals from the sensory profiles to semantic representations that facilitate recognition or discrimination of complex stimuli. In this vein, convolutional neural networks (CNNs) have been used very successfully in image and audio classification. Designed to imitate the hierarchical structure of the nervous system, CNNs reflect activation with increasing degrees of complexity that transform the incoming signal onto object-level representations. In this work, we employ a CNN trained for large-scale audio object classification to gain insights about the contribution of various audio representations that guide sound perception. The analysis contrasts activation of different layers of a CNN with acoustic features extracted directly from the scenes, perceptual salience obtained from behavioral responses of human listeners, as well as neural oscillations recorded by electroencephalography (EEG) in response to the same natural scenes. All three measures are tightly linked quantities believed to guide percepts of salience and object formation when listening to complex scenes. The results paint a picture of the intricate interplay between low-level and object-level representations in guiding auditory salience that is very much dependent on context and sound category.
Collapse
Affiliation(s)
- Nicholas Huang
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Malcolm Slaney
- Machine Hearing, Google AI, Google (United States), Mountain View, CA, United States
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
10
|
Huang N, Elhilali M. Auditory salience using natural soundscapes. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2163. [PMID: 28372080 PMCID: PMC6909985 DOI: 10.1121/1.4979055] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 03/09/2017] [Accepted: 03/10/2017] [Indexed: 05/26/2023]
Abstract
Salience describes the phenomenon by which an object stands out from a scene. While its underlying processes are extensively studied in vision, mechanisms of auditory salience remain largely unknown. Previous studies have used well-controlled auditory scenes to shed light on some of the acoustic attributes that drive the salience of sound events. Unfortunately, the use of constrained stimuli in addition to a lack of well-established benchmarks of salience judgments hampers the development of comprehensive theories of sensory-driven auditory attention. The present study explores auditory salience in a set of dynamic natural scenes. A behavioral measure of salience is collected by having human volunteers listen to two concurrent scenes and indicate continuously which one attracts their attention. By using natural scenes, the study takes a data-driven rather than experimenter-driven approach to exploring the parameters of auditory salience. The findings indicate that the space of auditory salience is multidimensional (spanning loudness, pitch, spectral shape, as well as other acoustic attributes), nonlinear and highly context-dependent. Importantly, the results indicate that contextual information about the entire scene over both short and long scales needs to be considered in order to properly account for perceptual judgments of salience.
Collapse
Affiliation(s)
- Nicholas Huang
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| |
Collapse
|
11
|
Abstract
Sounds in everyday life seldom appear in isolation. Both humans and machines are constantly flooded with a cacophony of sounds that need to be sorted through and scoured for relevant information-a phenomenon referred to as the 'cocktail party problem'. A key component in parsing acoustic scenes is the role of attention, which mediates perception and behaviour by focusing both sensory and cognitive resources on pertinent information in the stimulus space. The current article provides a review of modelling studies of auditory attention. The review highlights how the term attention refers to a multitude of behavioural and cognitive processes that can shape sensory processing. Attention can be modulated by 'bottom-up' sensory-driven factors, as well as 'top-down' task-specific goals, expectations and learned schemas. Essentially, it acts as a selection process or processes that focus both sensory and cognitive resources on the most relevant events in the soundscape; with relevance being dictated by the stimulus itself (e.g. a loud explosion) or by a task at hand (e.g. listen to announcements in a busy airport). Recent computational models of auditory attention provide key insights into its role in facilitating perception in cluttered auditory scenes.This article is part of the themed issue 'Auditory and visual scene analysis'.
Collapse
Affiliation(s)
- Emine Merve Kaya
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N Charles Street, Barton Hall, Baltimore, MD 21218, USA
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N Charles Street, Barton Hall, Baltimore, MD 21218, USA
| |
Collapse
|
12
|
Wang J, Zhang K, Madani K, Sabourin C. Salient environmental sound detection framework for machine awareness. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.09.046] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Kaya EM, Elhilali M. Investigating bottom-up auditory attention. Front Hum Neurosci 2014; 8:327. [PMID: 24904367 PMCID: PMC4034154 DOI: 10.3389/fnhum.2014.00327] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 05/01/2014] [Indexed: 11/22/2022] Open
Abstract
Bottom-up attention is a sensory-driven selection mechanism that directs perception toward a subset of the stimulus that is considered salient, or attention-grabbing. Most studies of bottom-up auditory attention have adapted frameworks similar to visual attention models whereby local or global “contrast” is a central concept in defining salient elements in a scene. In the current study, we take a more fundamental approach to modeling auditory attention; providing the first examination of the space of auditory saliency spanning pitch, intensity and timbre; and shedding light on complex interactions among these features. Informed by psychoacoustic results, we develop a computational model of auditory saliency implementing a novel attentional framework, guided by processes hypothesized to take place in the auditory pathway. In particular, the model tests the hypothesis that perception tracks the evolution of sound events in a multidimensional feature space, and flags any deviation from background statistics as salient. Predictions from the model corroborate the relationship between bottom-up auditory attention and statistical inference, and argues for a potential role of predictive coding as mechanism for saliency detection in acoustic scenes.
Collapse
Affiliation(s)
- Emine Merve Kaya
- Department of Electrical and Computer Engineering, The Johns Hopkins University Baltimore, MD, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University Baltimore, MD, USA
| |
Collapse
|