1
|
Kachlicka M, Patel AD, Liu F, Tierney A. Weighting of cues to categorization of song versus speech in tone-language and non-tone-language speakers. Cognition 2024; 246:105757. [PMID: 38442588 DOI: 10.1016/j.cognition.2024.105757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/09/2024] [Accepted: 02/20/2024] [Indexed: 03/07/2024]
Abstract
One of the most important auditory categorization tasks a listener faces is determining a sound's domain, a process which is a prerequisite for successful within-domain categorization tasks such as recognizing different speech sounds or musical tones. Speech and song are universal in human cultures: how do listeners categorize a sequence of words as belonging to one or the other of these domains? There is growing interest in the acoustic cues that distinguish speech and song, but it remains unclear whether there are cross-cultural differences in the evidence upon which listeners rely when making this fundamental perceptual categorization. Here we use the speech-to-song illusion, in which some spoken phrases perceptually transform into song when repeated, to investigate cues to this domain-level categorization in native speakers of tone languages (Mandarin and Cantonese speakers residing in the United Kingdom and China) and in native speakers of a non-tone language (English). We find that native tone-language and non-tone-language listeners largely agree on which spoken phrases sound like song after repetition, and we also find that the strength of this transformation is not significantly different across language backgrounds or countries of residence. Furthermore, we find a striking similarity in the cues upon which listeners rely when perceiving word sequences as singing versus speech, including small pitch intervals, flat within-syllable pitch contours, and steady beats. These findings support the view that there are certain widespread cross-cultural similarities in the mechanisms by which listeners judge if a word sequence is spoken or sung.
Collapse
Affiliation(s)
- Magdalena Kachlicka
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, United Kingdom
| | - Aniruddh D Patel
- Department of Psychology, Tufts University, 419 Boston Ave, Medford, USA; Program in Brain, Mind, and Consciousness, Canadian Institute for Advanced Research, 661 University Avenue, Toronto, Canada
| | - Fang Liu
- School of Psychology and Clinical Language Sciences, University of Reading, Whiteknights, Reading, United Kingdom
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, United Kingdom.
| |
Collapse
|
2
|
Saitis C, Siedenburg K. Brightness perception for musical instrument sounds: Relation to timbre dissimilarity and source-cause categories. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2256. [PMID: 33138535 DOI: 10.1121/10.0002275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 09/30/2020] [Indexed: 06/11/2023]
Abstract
Timbre dissimilarity of orchestral sounds is well-known to be multidimensional, with attack time and spectral centroid representing its two most robust acoustical correlates. The centroid dimension is traditionally considered as reflecting timbral brightness. However, the question of whether multiple continuous acoustical and/or categorical cues influence brightness perception has not been addressed comprehensively. A triangulation approach was used to examine the dimensionality of timbral brightness, its robustness across different psychoacoustical contexts, and relation to perception of the sounds' source-cause. Listeners compared 14 acoustic instrument sounds in three distinct tasks that collected general dissimilarity, brightness dissimilarity, and direct multi-stimulus brightness ratings. Results confirmed that brightness is a robust unitary auditory dimension, with direct ratings recovering the centroid dimension of general dissimilarity. When a two-dimensional space of brightness dissimilarity was considered, its second dimension correlated with the attack-time dimension of general dissimilarity, which was interpreted as reflecting a potential infiltration of the latter into brightness dissimilarity. Dissimilarity data were further modeled using partial least-squares regression with audio descriptors as predictors. Adding predictors derived from instrument family and the type of resonator and excitation did not improve the model fit, indicating that brightness perception is underpinned primarily by acoustical rather than source-cause cues.
Collapse
Affiliation(s)
- Charalampos Saitis
- Audio Communication Group, TU Berlin, Einsteinufer 17c, D-10587 Berlin, Germany
| | - Kai Siedenburg
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
| |
Collapse
|
3
|
Moskowitz HS, Lee WW, Sussman ES. Response Advantage for the Identification of Speech Sounds. Front Psychol 2020; 11:1155. [PMID: 32655436 PMCID: PMC7325938 DOI: 10.3389/fpsyg.2020.01155] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 05/05/2020] [Indexed: 11/13/2022] Open
Abstract
The ability to distinguish among different types of sounds in the environment and to identify sound sources is a fundamental skill of the auditory system. This study tested responses to sounds by stimulus category (speech, music, and environmental) in adults with normal hearing to determine under what task conditions there was a processing advantage for speech. We hypothesized that speech sounds would be processed faster and more accurately than non-speech sounds under specific listening conditions and different behavioral goals. Thus, we used three different task conditions allowing us to compare detection and identification of sound categories in an auditory oddball paradigm and in a repetition-switch category paradigm. We found that response time and accuracy were modulated by the specific task demands. The sound category itself had no effect on sound detection outcomes but had a pronounced effect on sound identification. Faster and more accurate responses to speech were found only when identifying sounds. We demonstrate a speech processing "advantage" when identifying the sound category among non-categorical sounds and when detecting and identifying among categorical sounds. Thus, overall, our results are consistent with a theory of speech processing that relies on specialized systems distinct from music and other environmental sounds.
Collapse
Affiliation(s)
- Howard S Moskowitz
- Department of Otorhinolaryngology-Head and Neck Surgery, Albert Einstein College of Medicine, New York, NY, United States
| | - Wei Wei Lee
- Department of Neuroscience, Albert Einstein College of Medicine, New York, NY, United States
| | - Elyse S Sussman
- Department of Otorhinolaryngology-Head and Neck Surgery, Albert Einstein College of Medicine, New York, NY, United States.,Department of Neuroscience, Albert Einstein College of Medicine, New York, NY, United States
| |
Collapse
|
4
|
Siedenburg K, Schädler MR, Hülsmeier D. Modeling the onset advantage in musical instrument recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:EL523. [PMID: 31893751 DOI: 10.1121/1.5141369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 11/27/2019] [Indexed: 06/10/2023]
Abstract
Sound onsets provide particularly valuable cues for musical instrument identification by human listeners. It has remained unclear whether this onset advantage is due to enhanced perceptual encoding or the richness of acoustical information during onsets. Here this issue was approached by modeling a recent study on instrument identification from tone excerpts [Siedenburg. (2019). J. Acoust. Soc. Am. 145(2), 1078-1087]. A simple Hidden Markov Model classifier with separable Gabor filterbank features simulated human performance and replicated the onset advantage observed previously for human listeners. These results provide evidence that the onset advantage may be driven by the distinct acoustic qualities of onsets.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, , ,
| | - Marc René Schädler
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, , ,
| | - David Hülsmeier
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, , ,
| |
Collapse
|
5
|
Ogg M, Carlson TA, Slevc LR. The Rapid Emergence of Auditory Object Representations in Cortex Reflect Central Acoustic Attributes. J Cogn Neurosci 2019; 32:111-123. [PMID: 31560265 DOI: 10.1162/jocn_a_01472] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Human listeners are bombarded by acoustic information that the brain rapidly organizes into coherent percepts of objects and events in the environment, which aids speech and music perception. The efficiency of auditory object recognition belies the critical constraint that acoustic stimuli necessarily require time to unfold. Using magnetoencephalography, we studied the time course of the neural processes that transform dynamic acoustic information into auditory object representations. Participants listened to a diverse set of 36 tokens comprising everyday sounds from a typical human environment. Multivariate pattern analysis was used to decode the sound tokens from the magnetoencephalographic recordings. We show that sound tokens can be decoded from brain activity beginning 90 msec after stimulus onset with peak decoding performance occurring at 155 msec poststimulus onset. Decoding performance was primarily driven by differences between category representations (e.g., environmental vs. instrument sounds), although within-category decoding was better than chance. Representational similarity analysis revealed that these emerging neural representations were related to harmonic and spectrotemporal differences among the stimuli, which correspond to canonical acoustic features processed by the auditory pathway. Our findings begin to link the processing of physical sound properties with the perception of auditory objects and events in cortex.
Collapse
|
6
|
Ogg M, Slevc LR. Acoustic Correlates of Auditory Object and Event Perception: Speakers, Musical Timbres, and Environmental Sounds. Front Psychol 2019; 10:1594. [PMID: 31379658 PMCID: PMC6650748 DOI: 10.3389/fpsyg.2019.01594] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 06/25/2019] [Indexed: 11/13/2022] Open
Abstract
Human listeners must identify and orient themselves to auditory objects and events in their environment. What acoustic features support a listener's ability to differentiate the great variety of natural sounds they might encounter? Studies of auditory object perception typically examine identification (and confusion) responses or dissimilarity ratings between pairs of objects and events. However, the majority of this prior work has been conducted within single categories of sound. This separation has precluded a broader understanding of the general acoustic attributes that govern auditory object and event perception within and across different behaviorally relevant sound classes. The present experiments take a broader approach by examining multiple categories of sound relative to one another. This approach bridges critical gaps in the literature and allows us to identify (and assess the relative importance of) features that are useful for distinguishing sounds within, between and across behaviorally relevant sound categories. To do this, we conducted behavioral sound identification (Experiment 1) and dissimilarity rating (Experiment 2) studies using a broad set of stimuli that leveraged the acoustic variability within and between different sound categories via a diverse set of 36 sound tokens (12 utterances from different speakers, 12 instrument timbres, and 12 everyday objects from a typical human environment). Multidimensional scaling solutions as well as analyses of item-pair-level responses as a function of different acoustic qualities were used to understand what acoustic features informed participants' responses. In addition to the spectral and temporal envelope qualities noted in previous work, listeners' dissimilarity ratings were associated with spectrotemporal variability and aperiodicity. Subsets of these features (along with fundamental frequency variability) were also useful for making specific within or between sound category judgments. Dissimilarity ratings largely paralleled sound identification performance, however the results of these tasks did not completely mirror one another. In addition, musical training was related to improved sound identification performance.
Collapse
Affiliation(s)
- Mattson Ogg
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| | - L. Robert Slevc
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
7
|
Ogg M, Moraczewski D, Kuchinsky SE, Slevc LR. Separable neural representations of sound sources: Speaker identity and musical timbre. Neuroimage 2019; 191:116-126. [PMID: 30731247 DOI: 10.1016/j.neuroimage.2019.01.075] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 12/14/2018] [Accepted: 01/30/2019] [Indexed: 11/28/2022] Open
Abstract
Human listeners can quickly and easily recognize different sound sources (objects and events) in their environment. Understanding how this impressive ability is accomplished can improve signal processing and machine intelligence applications along with assistive listening technologies. However, it is not clear how the brain represents the many sounds that humans can recognize (such as speech and music) at the level of individual sources, categories and acoustic features. To examine the cortical organization of these representations, we used patterns of fMRI responses to decode 1) four individual speakers and instruments from one another (separately, within each category), 2) the superordinate category labels associated with each stimulus (speech or instrument), and 3) a set of simple synthesized sounds that could be differentiated entirely on their acoustic features. Data were collected using an interleaved silent steady state sequence to increase the temporal signal-to-noise ratio, and mitigate issues with auditory stimulus presentation in fMRI. Largely separable clusters of voxels in the temporal lobes supported the decoding of individual speakers and instruments from other stimuli in the same category. Decoding the superordinate category of each sound was more accurate and involved a larger portion of the temporal lobes. However, these clusters all overlapped with areas that could decode simple, acoustically separable stimuli. Thus, individual sound sources from different sound categories are represented in separate regions of the temporal lobes that are situated within regions implicated in more general acoustic processes. These results bridge an important gap in our understanding of cortical representations of sounds and their acoustics.
Collapse
Affiliation(s)
- Mattson Ogg
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA.
| | - Dustin Moraczewski
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA
| | - Stefanie E Kuchinsky
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Center for Advanced Study of Language, University of Maryland, College Park, MD, 20742, USA; Maryland Neuroimaging Center, University of Maryland, College Park, MD, 20742, USA
| | - L Robert Slevc
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA
| |
Collapse
|
8
|
Siedenburg K. Specifying the perceptual relevance of onset transients for musical instrument identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1078. [PMID: 30823780 DOI: 10.1121/1.5091778] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 02/04/2019] [Indexed: 06/09/2023]
Abstract
Sound onsets are commonly considered to play a privileged role in the identification of musical instruments, but the underlying acoustic features remain unclear. By using sounds resynthesized with and without rapidly varying transients (not to be confused with the onset as a whole), this study set out to specify precisely the role of transients and quasi-stationary components in the perception of musical instrument sounds. In experiment 1, listeners were trained to identify ten instruments from 250 ms sounds. In a subsequent test phase, listeners identified instruments from 64 ms segments of sounds presented with or without transient components, either taken from the onset, or from the middle portion of the sounds. The omission of transient components at the onset impaired overall identification accuracy only by 6%, even though experiment 2 suggested that their omission was discriminable. Shifting the position of the gate from the onset to the middle portion of the tone impaired overall identification accuracy by 25%. Taken together, these findings confirm the prominent status of onsets in musical instrument identification, but suggest that rapidly varying transients are less indicative of instrument identity compared to the relatively slow buildup of sinusoidal components during onsets.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| |
Collapse
|