1
|
Kachlicka M, Patel AD, Liu F, Tierney A. Weighting of cues to categorization of song versus speech in tone-language and non-tone-language speakers. Cognition 2024; 246:105757. [PMID: 38442588 DOI: 10.1016/j.cognition.2024.105757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/09/2024] [Accepted: 02/20/2024] [Indexed: 03/07/2024]
Abstract
One of the most important auditory categorization tasks a listener faces is determining a sound's domain, a process which is a prerequisite for successful within-domain categorization tasks such as recognizing different speech sounds or musical tones. Speech and song are universal in human cultures: how do listeners categorize a sequence of words as belonging to one or the other of these domains? There is growing interest in the acoustic cues that distinguish speech and song, but it remains unclear whether there are cross-cultural differences in the evidence upon which listeners rely when making this fundamental perceptual categorization. Here we use the speech-to-song illusion, in which some spoken phrases perceptually transform into song when repeated, to investigate cues to this domain-level categorization in native speakers of tone languages (Mandarin and Cantonese speakers residing in the United Kingdom and China) and in native speakers of a non-tone language (English). We find that native tone-language and non-tone-language listeners largely agree on which spoken phrases sound like song after repetition, and we also find that the strength of this transformation is not significantly different across language backgrounds or countries of residence. Furthermore, we find a striking similarity in the cues upon which listeners rely when perceiving word sequences as singing versus speech, including small pitch intervals, flat within-syllable pitch contours, and steady beats. These findings support the view that there are certain widespread cross-cultural similarities in the mechanisms by which listeners judge if a word sequence is spoken or sung.
Collapse
Affiliation(s)
- Magdalena Kachlicka
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, United Kingdom
| | - Aniruddh D Patel
- Department of Psychology, Tufts University, 419 Boston Ave, Medford, USA; Program in Brain, Mind, and Consciousness, Canadian Institute for Advanced Research, 661 University Avenue, Toronto, Canada
| | - Fang Liu
- School of Psychology and Clinical Language Sciences, University of Reading, Whiteknights, Reading, United Kingdom
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, United Kingdom.
| |
Collapse
|
2
|
Akça M, Vuoskoski JK, Laeng B, Bishop L. Recognition of brief sounds in rapid serial auditory presentation. PLoS One 2023; 18:e0284396. [PMID: 37053212 PMCID: PMC10101377 DOI: 10.1371/journal.pone.0284396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Accepted: 03/30/2023] [Indexed: 04/14/2023] Open
Abstract
Two experiments were conducted to test the role of participant factors (i.e., musical sophistication, working memory capacity) and stimulus factors (i.e., sound duration, timbre) on auditory recognition using a rapid serial auditory presentation paradigm. Participants listened to a rapid stream of very brief sounds ranging from 30 to 150 milliseconds and were tested on their ability to distinguish the presence from the absence of a target sound selected from various sound sources placed amongst the distracters. Experiment 1a established that brief exposure to stimuli (60 to 150 milliseconds) does not necessarily correspond to impaired recognition. In Experiment 1b we found evidence that 30 milliseconds of exposure to the stimuli significantly impairs recognition of single auditory targets, but the recognition for voice and sine tone targets impaired the least, suggesting that the lower limit required for successful recognition could be lower than 30 milliseconds for voice and sine tone targets. Critically, the effect of sound duration on recognition completely disappeared when differences in musical sophistication were controlled for. Participants' working memory capacities did not seem to predict their recognition performances. Our behavioral results extend the studies oriented to understand the processing of brief timbres under temporal constraint by suggesting that the musical sophistication may play a larger role than previously thought. These results can also provide a working hypothesis for future research, namely, that underlying neural mechanisms for the processing of various sound sources may have different temporal constraints.
Collapse
Affiliation(s)
- Merve Akça
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
| | - Jonna Katariina Vuoskoski
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Bruno Laeng
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Laura Bishop
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
| |
Collapse
|
3
|
Ennaji FE, Fagot J, Belin P. Categorization of vocal and nonvocal stimuli in Guinea baboons (Papio papio). Am J Primatol 2022; 84:e23387. [PMID: 35521711 DOI: 10.1002/ajp.23387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 03/30/2022] [Accepted: 04/02/2022] [Indexed: 11/06/2022]
Abstract
Categorization of vocal sounds apart from other sounds is one of the key abilities in human voice processing, but whether this ability is present in other animals, particularly nonhuman primates, remains unclear. In the present study, 25 socially housed Guinea baboons (Papio papio) were tested on a vocal/nonvocal categorization task using Go/Nogo paradigm implemented on freely accessible automated learning devices. Three individuals from the group successfully learned to sort Grunt vocalizations from nonvocal sounds, and they generalized to new stimuli from the two categories, indicating that some baboons have the ability to develop open-ended categories in the auditory domain. Contrary to our hypothesis based on the human literature, these monkeys learned the nonvocal category faster than the Grunt category. Moreover, they failed to generalize their classification to new classes of conspecific vocalizations (wahoo, bark, yak, and copulation calls), and they categorized human vocalizations in the nonvocal category, suggesting that they had failed to represent the task as a vocal versus nonvocal categorization problem. Thus, our results do not confirm the existence of a separate perceptual category for conspecific vocalizations in baboons. Interestingly, the three successful baboons are the youngest of the group, with less training in visual tasks, which supports previous reports of age and learning history as crucial factors in auditory laboratory experiments.
Collapse
Affiliation(s)
- Fatima-Ezzahra Ennaji
- La Timone Neuroscience Institute, CNRS and Aix-Marseille University, Marseille, France.,Laboratory of Cognitive Psychology, CNRS and Aix-Marseille University, Marseille, France
| | - Joël Fagot
- Laboratory of Cognitive Psychology, CNRS and Aix-Marseille University, Marseille, France.,Station de Primatologie-Celphedia, CNRS UAR846, Rousset, France
| | - Pascal Belin
- La Timone Neuroscience Institute, CNRS and Aix-Marseille University, Marseille, France.,Station de Primatologie-Celphedia, CNRS UAR846, Rousset, France.,Psychology Department, Montreal University, Montreal, Quebec, Canada
| |
Collapse
|
4
|
Swanborough H, Staib M, Frühholz S. Neurocognitive dynamics of near-threshold voice signal detection and affective voice evaluation. SCIENCE ADVANCES 2020; 6:6/50/eabb3884. [PMID: 33310844 PMCID: PMC7732184 DOI: 10.1126/sciadv.abb3884] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 10/29/2020] [Indexed: 05/10/2023]
Abstract
Communication and voice signal detection in noisy environments are universal tasks for many species. The fundamental problem of detecting voice signals in noise (VIN) is underinvestigated especially in its temporal dynamic properties. We investigated VIN as a dynamic signal-to-noise ratio (SNR) problem to determine the neurocognitive dynamics of subthreshold evidence accrual and near-threshold voice signal detection. Experiment 1 showed that dynamic VIN, including a varying SNR and subthreshold sensory evidence accrual, is superior to similar conditions with nondynamic SNRs or with acoustically matched sounds. Furthermore, voice signals with affective meaning have a detection advantage during VIN. Experiment 2 demonstrated that VIN is driven by an effective neural integration in an auditory cortical-limbic network at and beyond the near-threshold detection point, which is preceded by activity in subcortical auditory nuclei. This demonstrates the superior recognition advantage of communication signals in dynamic noise contexts, especially when carrying socio-affective meaning.
Collapse
Affiliation(s)
- Huw Swanborough
- Cognitive and Affective Neuroscience Unit, Department of Psychology, University of Zurich, Zurich, Switzerland.
- Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Matthias Staib
- Cognitive and Affective Neuroscience Unit, Department of Psychology, University of Zurich, Zurich, Switzerland
- Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Sascha Frühholz
- Cognitive and Affective Neuroscience Unit, Department of Psychology, University of Zurich, Zurich, Switzerland.
- Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
- Department of Psychology, University of Oslo, Oslo, Norway
| |
Collapse
|
5
|
Moskowitz HS, Lee WW, Sussman ES. Response Advantage for the Identification of Speech Sounds. Front Psychol 2020; 11:1155. [PMID: 32655436 PMCID: PMC7325938 DOI: 10.3389/fpsyg.2020.01155] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 05/05/2020] [Indexed: 11/13/2022] Open
Abstract
The ability to distinguish among different types of sounds in the environment and to identify sound sources is a fundamental skill of the auditory system. This study tested responses to sounds by stimulus category (speech, music, and environmental) in adults with normal hearing to determine under what task conditions there was a processing advantage for speech. We hypothesized that speech sounds would be processed faster and more accurately than non-speech sounds under specific listening conditions and different behavioral goals. Thus, we used three different task conditions allowing us to compare detection and identification of sound categories in an auditory oddball paradigm and in a repetition-switch category paradigm. We found that response time and accuracy were modulated by the specific task demands. The sound category itself had no effect on sound detection outcomes but had a pronounced effect on sound identification. Faster and more accurate responses to speech were found only when identifying sounds. We demonstrate a speech processing "advantage" when identifying the sound category among non-categorical sounds and when detecting and identifying among categorical sounds. Thus, overall, our results are consistent with a theory of speech processing that relies on specialized systems distinct from music and other environmental sounds.
Collapse
Affiliation(s)
- Howard S Moskowitz
- Department of Otorhinolaryngology-Head and Neck Surgery, Albert Einstein College of Medicine, New York, NY, United States
| | - Wei Wei Lee
- Department of Neuroscience, Albert Einstein College of Medicine, New York, NY, United States
| | - Elyse S Sussman
- Department of Otorhinolaryngology-Head and Neck Surgery, Albert Einstein College of Medicine, New York, NY, United States.,Department of Neuroscience, Albert Einstein College of Medicine, New York, NY, United States
| |
Collapse
|
6
|
Akça M, Laeng B, Godøy RI. No Evidence for an Auditory Attentional Blink for Voices Regardless of Musical Expertise. Front Psychol 2020; 10:2935. [PMID: 31998190 PMCID: PMC6966238 DOI: 10.3389/fpsyg.2019.02935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 12/11/2019] [Indexed: 12/03/2022] Open
Abstract
Background: Attending to goal-relevant information can leave us metaphorically "blind" or "deaf" to the next relevant information while searching among distracters. This temporal cost lasting for about a half a second on the human selective attention has been long explored using the attentional blink paradigm. Although there is evidence that certain visual stimuli relating to one's area of expertise can be less susceptible to attentional blink effects, it remains unexplored whether the dynamics of temporal selective attention vary with expertise and objects types in the auditory modality. Methods: Using the auditory version of the attentional blink paradigm, the present study investigates whether certain auditory objects relating to musical and perceptual expertise could have an impact on the transient costs of selective attention. In this study, expert cellists and novice participants were asked to first identify a target sound, and then to detect instrumental timbres of cello or organ, or human voice as a second target in a rapid auditory stream. Results: The results showed moderate evidence against the attentional blink effect for voices independent of participants' musical expertise. Experts outperformed novices in their overall accuracy levels of target identification and detection, reflecting a clear benefit of musical expertise. Importantly, the musicianship advantage disappeared when the human voices served as the second target in the stream. Discussion: The results are discussed in terms of stimulus salience, the advantage of voice processing, as well as perceptual and musical expertise in relation to attention and working memory performances.
Collapse
Affiliation(s)
- Merve Akça
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
| | - Bruno Laeng
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Rolf Inge Godøy
- RITMO Center for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
| |
Collapse
|
7
|
The time course of auditory recognition measured with rapid sequences of short natural sounds. Sci Rep 2019; 9:8005. [PMID: 31142750 PMCID: PMC6541711 DOI: 10.1038/s41598-019-43126-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 03/25/2019] [Indexed: 11/09/2022] Open
Abstract
Human listeners are able to recognize accurately an impressive range of complex sounds, such as musical instruments or voices. The underlying mechanisms are still poorly understood. Here, we aimed to characterize the processing time needed to recognize a natural sound. To do so, by analogy with the “rapid visual sequential presentation paradigm”, we embedded short target sounds within rapid sequences of distractor sounds. The core hypothesis is that any correct report of the target implies that sufficient processing for recognition had been completed before the time of occurrence of the subsequent distractor sound. We conducted four behavioral experiments using short natural sounds (voices and instruments) as targets or distractors. We report the effects on performance, as measured by the fastest presentation rate for recognition, of sound duration, number of sounds in a sequence, the relative pitch between target and distractors and target position in the sequence. Results showed a very rapid auditory recognition of natural sounds in all cases. Targets could be recognized at rates up to 30 sounds per second. In addition, the best performance was observed for voices in sequences of instruments. These results give new insights about the remarkable efficiency of timbre processing in humans, using an original behavioral paradigm to provide strong constraints on future neural models of sound recognition.
Collapse
|
8
|
Jagiello R, Pomper U, Yoneya M, Zhao S, Chait M. Rapid Brain Responses to Familiar vs. Unfamiliar Music - an EEG and Pupillometry study. Sci Rep 2019; 9:15570. [PMID: 31666553 PMCID: PMC6821741 DOI: 10.1038/s41598-019-51759-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 10/07/2019] [Indexed: 12/17/2022] Open
Abstract
Human listeners exhibit marked sensitivity to familiar music, perhaps most readily revealed by popular “name that tune” games, in which listeners often succeed in recognizing a familiar song based on extremely brief presentation. In this work, we used electroencephalography (EEG) and pupillometry to reveal the temporal signatures of the brain processes that allow differentiation between a familiar, well liked, and unfamiliar piece of music. In contrast to previous work, which has quantified gradual changes in pupil diameter (the so-called “pupil dilation response”), here we focus on the occurrence of pupil dilation events. This approach is substantially more sensitive in the temporal domain and allowed us to tap early activity with the putative salience network. Participants (N = 10) passively listened to snippets (750 ms) of a familiar, personally relevant and, an acoustically matched, unfamiliar song, presented in random order. A group of control participants (N = 12), who were unfamiliar with all of the songs, was also tested. We reveal a rapid differentiation between snippets from familiar and unfamiliar songs: Pupil responses showed greater dilation rate to familiar music from 100–300 ms post-stimulus-onset, consistent with a faster activation of the autonomic salience network. Brain responses measured with EEG showed a later differentiation between familiar and unfamiliar music from 350 ms post onset. Remarkably, the cluster pattern identified in the EEG response is very similar to that commonly found in the classic old/new memory retrieval paradigms, suggesting that the recognition of brief, randomly presented, music snippets, draws on similar processes.
Collapse
Affiliation(s)
- Robert Jagiello
- Ear Institute, University College London, London, UK.,Institute of Cognitive and Evolutionary Anthropology, University of Oxford, Oxford, UK
| | - Ulrich Pomper
- Ear Institute, University College London, London, UK. .,Faculty of Psychology, University of Vienna, Vienna, Austria.
| | - Makoto Yoneya
- NTT Communication Science Laboratories, NTT Corporation, Atsugi, 243-0198, Japan
| | - Sijia Zhao
- Ear Institute, University College London, London, UK
| | - Maria Chait
- Ear Institute, University College London, London, UK.
| |
Collapse
|
9
|
Ogg M, Slevc LR. Acoustic Correlates of Auditory Object and Event Perception: Speakers, Musical Timbres, and Environmental Sounds. Front Psychol 2019; 10:1594. [PMID: 31379658 PMCID: PMC6650748 DOI: 10.3389/fpsyg.2019.01594] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 06/25/2019] [Indexed: 11/13/2022] Open
Abstract
Human listeners must identify and orient themselves to auditory objects and events in their environment. What acoustic features support a listener's ability to differentiate the great variety of natural sounds they might encounter? Studies of auditory object perception typically examine identification (and confusion) responses or dissimilarity ratings between pairs of objects and events. However, the majority of this prior work has been conducted within single categories of sound. This separation has precluded a broader understanding of the general acoustic attributes that govern auditory object and event perception within and across different behaviorally relevant sound classes. The present experiments take a broader approach by examining multiple categories of sound relative to one another. This approach bridges critical gaps in the literature and allows us to identify (and assess the relative importance of) features that are useful for distinguishing sounds within, between and across behaviorally relevant sound categories. To do this, we conducted behavioral sound identification (Experiment 1) and dissimilarity rating (Experiment 2) studies using a broad set of stimuli that leveraged the acoustic variability within and between different sound categories via a diverse set of 36 sound tokens (12 utterances from different speakers, 12 instrument timbres, and 12 everyday objects from a typical human environment). Multidimensional scaling solutions as well as analyses of item-pair-level responses as a function of different acoustic qualities were used to understand what acoustic features informed participants' responses. In addition to the spectral and temporal envelope qualities noted in previous work, listeners' dissimilarity ratings were associated with spectrotemporal variability and aperiodicity. Subsets of these features (along with fundamental frequency variability) were also useful for making specific within or between sound category judgments. Dissimilarity ratings largely paralleled sound identification performance, however the results of these tasks did not completely mirror one another. In addition, musical training was related to improved sound identification performance.
Collapse
Affiliation(s)
- Mattson Ogg
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| | - L. Robert Slevc
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
10
|
Ogg M, Slevc LR, Idsardi WJ. The time course of sound category identification: Insights from acoustic features. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:3459. [PMID: 29289109 DOI: 10.1121/1.5014057] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Humans have an impressive, automatic capacity for identifying and organizing sounds in their environment. However, little is known about the timescales that sound identification functions on, or the acoustic features that listeners use to identify auditory objects. To better understand the temporal and acoustic dynamics of sound category identification, two go/no-go perceptual gating studies were conducted. Participants heard speech, musical instrument, and human-environmental sounds ranging from 12.5 to 200 ms in duration. Listeners could reliably identify sound categories with just 25 ms of duration. In experiment 1, participants' performance on instrument sounds showed a distinct processing advantage at shorter durations. Experiment 2 revealed that this advantage was largely dependent on regularities in instrument onset characteristics relative to the spectrotemporal complexity of environmental sounds and speech. Models of participant responses indicated that listeners used spectral, temporal, noise, and pitch cues in the task. Aspects of spectral centroid were associated with responses for all categories, while noisiness and spectral flatness were associated with environmental and instrument responses, respectively. Responses for speech and environmental sounds were also associated with spectral features that varied over time. Experiment 2 indicated that variability in fundamental frequency was useful in identifying steady state speech and instrument stimuli.
Collapse
Affiliation(s)
- Mattson Ogg
- Neuroscience and Cognitive Science Program, University of Maryland, 4090 Union Drive, College Park, Maryland 20742, USA
| | - L Robert Slevc
- Department of Psychology, University of Maryland, 4094 Campus Drive, College Park, Maryland 20742, USA
| | - William J Idsardi
- Department of Linguistics, University of Maryland, 1401 Marie Mount Hall, College Park, Maryland 20742, USA
| |
Collapse
|
11
|
Siedenburg K, Müllensiefen D. Modeling Timbre Similarity of Short Music Clips. Front Psychol 2017; 8:639. [PMID: 28491045 PMCID: PMC5405345 DOI: 10.3389/fpsyg.2017.00639] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 04/10/2017] [Indexed: 11/20/2022] Open
Abstract
There is evidence from a number of recent studies that most listeners are able to extract information related to song identity, emotion, or genre from music excerpts with durations in the range of tenths of seconds. Because of these very short durations, timbre as a multifaceted auditory attribute appears as a plausible candidate for the type of features that listeners make use of when processing short music excerpts. However, the importance of timbre in listening tasks that involve short excerpts has not yet been demonstrated empirically. Hence, the goal of this study was to develop a method that allows to explore to what degree similarity judgments of short music clips can be modeled with low-level acoustic features related to timbre. We utilized the similarity data from two large samples of participants: Sample I was obtained via an online survey, used 16 clips of 400 ms length, and contained responses of 137,339 participants. Sample II was collected in a lab environment, used 16 clips of 800 ms length, and contained responses from 648 participants. Our model used two sets of audio features which included commonly used timbre descriptors and the well-known Mel-frequency cepstral coefficients as well as their temporal derivates. In order to predict pairwise similarities, the resulting distances between clips in terms of their audio features were used as predictor variables with partial least-squares regression. We found that a sparse selection of three to seven features from both descriptor sets—mainly encoding the coarse shape of the spectrum as well as spectrotemporal variability—best predicted similarities across the two sets of sounds. Notably, the inclusion of non-acoustic predictors of musical genre and record release date allowed much better generalization performance and explained up to 50% of shared variance (R2) between observations and model predictions. Overall, the results of this study empirically demonstrate that both acoustic features related to timbre as well as higher level categorical features such as musical genre play a major role in the perception of short music clips.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of OldenburgOldenburg, Germany
| | | |
Collapse
|
12
|
Gingras B, Marin MM, Puig-Waldmüller E, Fitch WT. The Eye is Listening: Music-Induced Arousal and Individual Differences Predict Pupillary Responses. Front Hum Neurosci 2015; 9:619. [PMID: 26617511 PMCID: PMC4639616 DOI: 10.3389/fnhum.2015.00619] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2015] [Accepted: 10/27/2015] [Indexed: 11/15/2022] Open
Abstract
Pupillary responses are a well-known indicator of emotional arousal but have not yet been systematically investigated in response to music. Here, we measured pupillary dilations evoked by short musical excerpts normalized for intensity and selected for their stylistic uniformity. Thirty participants (15 females) provided subjective ratings of music-induced felt arousal, tension, pleasantness, and familiarity for 80 classical music excerpts. The pupillary responses evoked by these excerpts were measured in another thirty participants (15 females). We probed the role of listener-specific characteristics such as mood, stress reactivity, self-reported role of music in life, liking for the selected excerpts, as well as of subjective responses to music, in pupillary responses. Linear mixed model analyses showed that a greater role of music in life was associated with larger dilations, and that larger dilations were also predicted for excerpts rated as more arousing or tense. However, an interaction between arousal and liking for the excerpts suggested that pupillary responses were modulated less strongly by arousal when the excerpts were particularly liked. An analogous interaction was observed between tension and liking. Additionally, males exhibited larger dilations than females. Overall, these findings suggest a complex interplay between bottom-up and top-down influences on pupillary responses to music.
Collapse
Affiliation(s)
- Bruno Gingras
- Institute of Psychology, University of Innsbruck Innsbruck, Austria
| | - Manuela M Marin
- Department of Basic Psychological Research and Research Methods, University of Vienna Vienna, Austria
| | | | - W T Fitch
- Department of Cognitive Biology, University of Vienna Vienna, Austria
| |
Collapse
|
13
|
Nees MA, Phillips C. Auditory Pareidolia: Effects of Contextual Priming on Perceptions of Purportedly Paranormal and Ambiguous Auditory Stimuli. APPLIED COGNITIVE PSYCHOLOGY 2014. [DOI: 10.1002/acp.3068] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
14
|
Gingras B, Marin MM, Fitch WT. Beyond Intensity: Spectral Features Effectively Predict Music-Induced Subjective Arousal. Q J Exp Psychol (Hove) 2014; 67:1428-46. [DOI: 10.1080/17470218.2013.863954] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Emotions in music are conveyed by a variety of acoustic cues. Notably, the positive association between sound intensity and arousal has particular biological relevance. However, although amplitude normalization is a common procedure used to control for intensity in music psychology research, direct comparisons between emotional ratings of original and amplitude-normalized musical excerpts are lacking. In this study, 30 nonmusicians retrospectively rated the subjective arousal and pleasantness induced by 84 six-second classical music excerpts, and an additional 30 nonmusicians rated the same excerpts normalized for amplitude. Following the cue-redundancy and Brunswik lens models of acoustic communication, we hypothesized that arousal and pleasantness ratings would be similar for both versions of the excerpts, and that arousal could be predicted effectively by other acoustic cues besides intensity. Although the difference in mean arousal and pleasantness ratings between original and amplitude-normalized excerpts correlated significantly with the amplitude adjustment, ratings for both sets of excerpts were highly correlated and shared a similar range of values, thus validating the use of amplitude normalization in music emotion research. Two acoustic parameters, spectral flux and spectral entropy, accounted for 65% of the variance in arousal ratings for both sets, indicating that spectral features can effectively predict arousal. Additionally, we confirmed that amplitude-normalized excerpts were adequately matched for loudness. Overall, the results corroborate our hypotheses and support the cue-redundancy and Brunswik lens models.
Collapse
Affiliation(s)
- Bruno Gingras
- Department of Cognitive Biology, University of Vienna, Vienna, Austria
| | - Manuela M. Marin
- Department of Basic Psychological Research and Research Methods, University of Vienna, Vienna, Austria
| | - W. Tecumseh Fitch
- Department of Cognitive Biology, University of Vienna, Vienna, Austria
| |
Collapse
|
15
|
Suied C, Agus TR, Thorpe SJ, Mesgarani N, Pressnitzer D. Auditory gist: recognition of very short sounds from timbre cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:1380-1391. [PMID: 24606276 DOI: 10.1121/1.4863659] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Sounds such as the voice or musical instruments can be recognized on the basis of timbre alone. Here, sound recognition was investigated with severely reduced timbre cues. Short snippets of naturally recorded sounds were extracted from a large corpus. Listeners were asked to report a target category (e.g., sung voices) among other sounds (e.g., musical instruments). All sound categories covered the same pitch range, so the task had to be solved on timbre cues alone. The minimum duration for which performance was above chance was found to be short, on the order of a few milliseconds, with the best performance for voice targets. Performance was independent of pitch and was maintained when stimuli contained less than a full waveform cycle. Recognition was not generally better when the sound snippets were time-aligned with the sound onset compared to when they were extracted with a random starting time. Finally, performance did not depend on feedback or training, suggesting that the cues used by listeners in the artificial gating task were similar to those relevant for longer, more familiar sounds. The results show that timbre cues for sound recognition are available at a variety of time scales, including very short ones.
Collapse
Affiliation(s)
- Clara Suied
- Institut de Recherche Biomédicale des Armées, Département Action et Cognition en Situation Opérationnelle, 91223 Brétigny sur Orge, France
| | - Trevor R Agus
- Sonic Arts Research Centre, School of Creative Arts, 1 Cloreen Park, Queen's University Belfast, Belfast, BT7 1NN, United Kingdom
| | - Simon J Thorpe
- Centre de Recherche Cerveau et Cognition, UMR 5549, CNRS and Université Paul Sabatier, Toulouse, France
| | - Nima Mesgarani
- Departments of Neurological Surgery and Physiology, UCSF Center for Integrative Neuroscience, University of California, San Francisco, California 94143
| | - Daniel Pressnitzer
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS and École normale supérieure, 29 rue d'Ulm, 75005 Paris, France
| |
Collapse
|
16
|
Aucouturier JJ, Bigand E. Seven problems that keep MIR from attracting the interest of cognition and neuroscience. J Intell Inf Syst 2013. [DOI: 10.1007/s10844-013-0251-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
17
|
Haro M, Serrà J, Herrera P, Corral A. Zipf's law in short-time timbral codings of speech, music, and environmental sound signals. PLoS One 2012; 7:e33993. [PMID: 22479497 PMCID: PMC3315504 DOI: 10.1371/journal.pone.0033993] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2011] [Accepted: 02/22/2012] [Indexed: 11/18/2022] Open
Abstract
Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources.
Collapse
Affiliation(s)
- Martín Haro
- Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain.
| | | | | | | |
Collapse
|
18
|
Auditory perceptual decision-making based on semantic categorization of environmental sounds. Neuroimage 2012; 60:1704-15. [PMID: 22330317 DOI: 10.1016/j.neuroimage.2012.01.131] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Revised: 01/24/2012] [Accepted: 01/25/2012] [Indexed: 11/21/2022] Open
Abstract
Discriminating complex sounds relies on multiple stages of differential brain activity. The specific roles of these stages and their links to perception were the focus of the present study. We presented 250 ms duration sounds of living and man-made objects while recording 160-channel electroencephalography (EEG). Subjects categorized each sound as that of a living, man-made or unknown item. We tested whether/when the brain discriminates between sound categories even when not transpiring behaviorally. We applied a single-trial classifier that identified voltage topographies and latencies at which brain responses are most discriminative. For sounds that the subjects could not categorize, we could successfully decode the semantic category based on differences in voltage topographies during the 116-174 ms post-stimulus period. Sounds that were correctly categorized as that of a living or man-made item by the same subjects exhibited two periods of differences in voltage topographies at the single-trial level. Subjects exhibited differential activity before the sound ended (starting at 112 ms) and on a separate period at ~270 ms post-stimulus onset. Because each of these periods could be used to reliably decode semantic categories, we interpreted the first as being related to an implicit tuning for sound representations and the second as being linked to perceptual decision-making processes. Collectively, our results show that the brain discriminates environmental sounds during early stages and independently of behavioral proficiency and that explicit sound categorization requires a subsequent processing stage.
Collapse
|