1
|
Zhuang Y, Yang S. Exploring quantitative indices to characterize piano timbre with precision validated using measurement system analysis. Front Psychol 2024; 15:1363329. [PMID: 38933586 PMCID: PMC11204798 DOI: 10.3389/fpsyg.2024.1363329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 05/14/2024] [Indexed: 06/28/2024] Open
Abstract
Aim Timbre in piano performance plays a critical role in enhancing musical expression. However, timbre control in current piano performance education relies mostly on descriptive characterization, which involves large variations of interpretation. The current study aimed to mitigate the limitations by identifying quantitative indices with adequate precision to characterize piano timbre. Methods A total of 24 sounds of G6 were recorded from 3 grand pianos, by 2 performers, and with 4 repetitions. The sounds were processed and analyzed with audio software for the frequencies and volumes of harmonic series in the spectrum curves. Ten quantitative timbre indices were calculated. Precision validation with statistical gage R&R analysis was conducted to gage the repeatability (between repetitions) and reproducibility (between performers) of the indices. The resultant percentage study variation (%SV) of an index must be ≤10% to be considered acceptable for characterizing piano timbre with enough precision. Results Out of the 10 indices, 4 indices had acceptable precision in characterizing piano timbre with %SV ≤10%, including the square sum of relative volume (4.40%), the frequency-weighted arithmetic mean of relative volume (4.29%), the sum of relative volume (3.11%), and the frequency-weighted sum of relative volume (2.09%). The novel indices identified in the current research will provide valuable tools to advance the measurement and communication of timbre and advance music performance education.
Collapse
Affiliation(s)
- Yuan Zhuang
- Department of Arts and Media, Tongji University, Shanghai, China
| | - Shuo Yang
- Department of Biomedical Engineering, Illinois Institute of Technology, Chicago, IL, United States
| |
Collapse
|
2
|
Thoret E, Andrillon T, Gauriau C, Léger D, Pressnitzer D. Sleep deprivation detected by voice analysis. PLoS Comput Biol 2024; 20:e1011849. [PMID: 38315733 PMCID: PMC10890756 DOI: 10.1371/journal.pcbi.1011849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 02/23/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024] Open
Abstract
Sleep deprivation has an ever-increasing impact on individuals and societies. Yet, to date, there is no quick and objective test for sleep deprivation. Here, we used automated acoustic analyses of the voice to detect sleep deprivation. Building on current machine-learning approaches, we focused on interpretability by introducing two novel ideas: the use of a fully generic auditory representation as input feature space, combined with an interpretation technique based on reverse correlation. The auditory representation consisted of a spectro-temporal modulation analysis derived from neurophysiology. The interpretation method aimed to reveal the regions of the auditory representation that supported the classifiers' decisions. Results showed that generic auditory features could be used to detect sleep deprivation successfully, with an accuracy comparable to state-of-the-art speech features. Furthermore, the interpretation revealed two distinct effects of sleep deprivation on the voice: changes in slow temporal modulations related to prosody and changes in spectral features related to voice quality. Importantly, the relative balance of the two effects varied widely across individuals, even though the amount of sleep deprivation was controlled, thus confirming the need to characterize sleep deprivation at the individual level. Moreover, while the prosody factor correlated with subjective sleepiness reports, the voice quality factor did not, consistent with the presence of both explicit and implicit consequences of sleep deprivation. Overall, the findings show that individual effects of sleep deprivation may be observed in vocal biomarkers. Future investigations correlating such markers with objective physiological measures of sleep deprivation could enable "sleep stethoscopes" for the cost-effective diagnosis of the individual effects of sleep deprivation.
Collapse
Affiliation(s)
- Etienne Thoret
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS, Paris, France
- Aix-Marseille University, CNRS, Institut de Neurosciences de la Timone (INT) UMR7289, Perception Representation Image Sound Music (PRISM) UMR7061, Laboratoire d’Informatique et Systèmes (LIS) UMR7020, Marseille, France
- Institute of Language Communication and the Brain, Aix-Marseille University, Marseille, France
| | - Thomas Andrillon
- Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, Mov’it team, Inserm, CNRS, Paris, France
- Université Paris Cité, VIFASOM, ERC 7330, Vigilance Fatigue Sommeil et santé publique, Paris, France
- APHP, Hôtel-Dieu, Centre du Sommeil et de la Vigilance, Paris, France
| | - Caroline Gauriau
- Université Paris Cité, VIFASOM, ERC 7330, Vigilance Fatigue Sommeil et santé publique, Paris, France
- APHP, Hôtel-Dieu, Centre du Sommeil et de la Vigilance, Paris, France
| | - Damien Léger
- Université Paris Cité, VIFASOM, ERC 7330, Vigilance Fatigue Sommeil et santé publique, Paris, France
- APHP, Hôtel-Dieu, Centre du Sommeil et de la Vigilance, Paris, France
| | - Daniel Pressnitzer
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS, Paris, France
| |
Collapse
|
3
|
McAdams S, Thoret E, Wang G, Montrey M. Timbral cues for learning to generalize musical instrument identity across pitch register. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:797. [PMID: 36859162 DOI: 10.1121/10.0017100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 01/12/2023] [Indexed: 06/18/2023]
Abstract
Timbre provides an important cue to identify musical instruments. Many timbral attributes covary with other parameters like pitch. This study explores listeners' ability to construct categories of instrumental sound sources from sounds that vary in pitch. Nonmusicians identified 11 instruments from the woodwind, brass, percussion, and plucked and bowed string families. In experiment 1, they were trained to identify instruments playing a pitch of C4, and in experiments 2 and 3, they were trained with a five-tone sequence (F#3-F#4), exposing them to the way timbre varies with pitch. Participants were required to reach a threshold of 75% correct identification in training. In the testing phase, successful listeners heard single tones (experiments 1 and 2) or three-tone sequences from (A3-D#4) (experiment 3) across each instrument's full pitch range to test their ability to generalize identification from the learned sound(s). Identification generalization over pitch varies a great deal across instruments. No significant differences were found between single-pitch and multi-pitch training or testing conditions. Identification rates can be predicted moderately well by spectrograms or modulation spectra. These results suggest that listeners use the most relevant acoustical invariance to identify musical instrument sounds, also using previous experience with the tested instruments.
Collapse
Affiliation(s)
- Stephen McAdams
- Schulich School of Music, McGill University, Montreal, Québec H3A 1E3, Canada
| | - Etienne Thoret
- Aix-Marseille University, Centre National de la Recherche Scientifique, Perception Representations Image Sound Music Laboratory, Unité Mixte de Recherche 7061, Laboratoire d'Informatique et Systèmes, Unité Mixte de Recherche 7020, 13009 Marseille, France
| | - Grace Wang
- Cognitive Science Program, McGill University, Montreal, Québec H3A 3R1, Canada
| | - Marcel Montrey
- Department of Psychology, McGill University, Montreal, Québec H3A 1G1, Canada
| |
Collapse
|
4
|
Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre. Nat Hum Behav 2020; 5:369-377. [PMID: 33257878 DOI: 10.1038/s41562-020-00987-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 09/18/2020] [Indexed: 11/08/2022]
Abstract
Humans excel at using sounds to make judgements about their immediate environment. In particular, timbre is an auditory attribute that conveys crucial information about the identity of a sound source, especially for music. While timbre has been primarily considered to occupy a multidimensional space, unravelling the acoustic correlates of timbre remains a challenge. Here we re-analyse 17 datasets from published studies between 1977 and 2016 and observe that original results are only partially replicable. We use a data-driven computational account to reveal the acoustic correlates of timbre. Human dissimilarity ratings are simulated with metrics learned on acoustic spectrotemporal modulation models inspired by cortical processing. We observe that timbre has both generic and experiment-specific acoustic correlates. These findings provide a broad overview of former studies on musical timbre and identify its relevant acoustic substrates according to biologically inspired models.
Collapse
|
5
|
Saitis C, Siedenburg K. Brightness perception for musical instrument sounds: Relation to timbre dissimilarity and source-cause categories. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2256. [PMID: 33138535 DOI: 10.1121/10.0002275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 09/30/2020] [Indexed: 06/11/2023]
Abstract
Timbre dissimilarity of orchestral sounds is well-known to be multidimensional, with attack time and spectral centroid representing its two most robust acoustical correlates. The centroid dimension is traditionally considered as reflecting timbral brightness. However, the question of whether multiple continuous acoustical and/or categorical cues influence brightness perception has not been addressed comprehensively. A triangulation approach was used to examine the dimensionality of timbral brightness, its robustness across different psychoacoustical contexts, and relation to perception of the sounds' source-cause. Listeners compared 14 acoustic instrument sounds in three distinct tasks that collected general dissimilarity, brightness dissimilarity, and direct multi-stimulus brightness ratings. Results confirmed that brightness is a robust unitary auditory dimension, with direct ratings recovering the centroid dimension of general dissimilarity. When a two-dimensional space of brightness dissimilarity was considered, its second dimension correlated with the attack-time dimension of general dissimilarity, which was interpreted as reflecting a potential infiltration of the latter into brightness dissimilarity. Dissimilarity data were further modeled using partial least-squares regression with audio descriptors as predictors. Adding predictors derived from instrument family and the type of resonator and excitation did not improve the model fit, indicating that brightness perception is underpinned primarily by acoustical rather than source-cause cues.
Collapse
Affiliation(s)
- Charalampos Saitis
- Audio Communication Group, TU Berlin, Einsteinufer 17c, D-10587 Berlin, Germany
| | - Kai Siedenburg
- Department of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, Carl von Ossietzky Universität Oldenburg, Oldenburg 26129, Germany
| |
Collapse
|
6
|
Ogg M, Carlson TA, Slevc LR. The Rapid Emergence of Auditory Object Representations in Cortex Reflect Central Acoustic Attributes. J Cogn Neurosci 2019; 32:111-123. [PMID: 31560265 DOI: 10.1162/jocn_a_01472] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Human listeners are bombarded by acoustic information that the brain rapidly organizes into coherent percepts of objects and events in the environment, which aids speech and music perception. The efficiency of auditory object recognition belies the critical constraint that acoustic stimuli necessarily require time to unfold. Using magnetoencephalography, we studied the time course of the neural processes that transform dynamic acoustic information into auditory object representations. Participants listened to a diverse set of 36 tokens comprising everyday sounds from a typical human environment. Multivariate pattern analysis was used to decode the sound tokens from the magnetoencephalographic recordings. We show that sound tokens can be decoded from brain activity beginning 90 msec after stimulus onset with peak decoding performance occurring at 155 msec poststimulus onset. Decoding performance was primarily driven by differences between category representations (e.g., environmental vs. instrument sounds), although within-category decoding was better than chance. Representational similarity analysis revealed that these emerging neural representations were related to harmonic and spectrotemporal differences among the stimuli, which correspond to canonical acoustic features processed by the auditory pathway. Our findings begin to link the processing of physical sound properties with the perception of auditory objects and events in cortex.
Collapse
|
7
|
Ogg M, Slevc LR. Acoustic Correlates of Auditory Object and Event Perception: Speakers, Musical Timbres, and Environmental Sounds. Front Psychol 2019; 10:1594. [PMID: 31379658 PMCID: PMC6650748 DOI: 10.3389/fpsyg.2019.01594] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 06/25/2019] [Indexed: 11/13/2022] Open
Abstract
Human listeners must identify and orient themselves to auditory objects and events in their environment. What acoustic features support a listener's ability to differentiate the great variety of natural sounds they might encounter? Studies of auditory object perception typically examine identification (and confusion) responses or dissimilarity ratings between pairs of objects and events. However, the majority of this prior work has been conducted within single categories of sound. This separation has precluded a broader understanding of the general acoustic attributes that govern auditory object and event perception within and across different behaviorally relevant sound classes. The present experiments take a broader approach by examining multiple categories of sound relative to one another. This approach bridges critical gaps in the literature and allows us to identify (and assess the relative importance of) features that are useful for distinguishing sounds within, between and across behaviorally relevant sound categories. To do this, we conducted behavioral sound identification (Experiment 1) and dissimilarity rating (Experiment 2) studies using a broad set of stimuli that leveraged the acoustic variability within and between different sound categories via a diverse set of 36 sound tokens (12 utterances from different speakers, 12 instrument timbres, and 12 everyday objects from a typical human environment). Multidimensional scaling solutions as well as analyses of item-pair-level responses as a function of different acoustic qualities were used to understand what acoustic features informed participants' responses. In addition to the spectral and temporal envelope qualities noted in previous work, listeners' dissimilarity ratings were associated with spectrotemporal variability and aperiodicity. Subsets of these features (along with fundamental frequency variability) were also useful for making specific within or between sound category judgments. Dissimilarity ratings largely paralleled sound identification performance, however the results of these tasks did not completely mirror one another. In addition, musical training was related to improved sound identification performance.
Collapse
Affiliation(s)
- Mattson Ogg
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| | - L. Robert Slevc
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
8
|
Burred JJ, Ponsot E, Goupil L, Liuni M, Aucouturier JJ. CLEESE: An open-source audio-transformation toolbox for data-driven experiments in speech and music cognition. PLoS One 2019; 14:e0205943. [PMID: 30947281 PMCID: PMC6448843 DOI: 10.1371/journal.pone.0205943] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 02/15/2019] [Indexed: 11/29/2022] Open
Abstract
Over the past few years, the field of visual social cognition and face processing has been dramatically impacted by a series of data-driven studies employing computer-graphics tools to synthesize arbitrary meaningful facial expressions. In the auditory modality, reverse correlation is traditionally used to characterize sensory processing at the level of spectral or spectro-temporal stimulus properties, but not higher-level cognitive processing of e.g. words, sentences or music, by lack of tools able to manipulate the stimulus dimensions that are relevant for these processes. Here, we present an open-source audio-transformation toolbox, called CLEESE, able to systematically randomize the prosody/melody of existing speech and music recordings. CLEESE works by cutting recordings in small successive time segments (e.g. every successive 100 milliseconds in a spoken utterance), and applying a random parametric transformation of each segment’s pitch, duration or amplitude, using a new Python-language implementation of the phase-vocoder digital audio technique. We present here two applications of the tool to generate stimuli for studying intonation processing of interrogative vs declarative speech, and rhythm processing of sung melodies.
Collapse
Affiliation(s)
| | - Emmanuel Ponsot
- Science and Technology of Music and Sound (UMR9912, IRCAM/CNRS/Sorbonne Université), Paris, France
- Laboratoire des Systèmes Perceptifs (CNRS UMR 8248) and Département d’études cognitives, École Normale Supérieure, PSL Research University, Paris, France
| | - Louise Goupil
- Science and Technology of Music and Sound (UMR9912, IRCAM/CNRS/Sorbonne Université), Paris, France
| | - Marco Liuni
- Science and Technology of Music and Sound (UMR9912, IRCAM/CNRS/Sorbonne Université), Paris, France
| | - Jean-Julien Aucouturier
- Science and Technology of Music and Sound (UMR9912, IRCAM/CNRS/Sorbonne Université), Paris, France
- * E-mail:
| |
Collapse
|
9
|
Siedenburg K. Specifying the perceptual relevance of onset transients for musical instrument identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1078. [PMID: 30823780 DOI: 10.1121/1.5091778] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 02/04/2019] [Indexed: 06/09/2023]
Abstract
Sound onsets are commonly considered to play a privileged role in the identification of musical instruments, but the underlying acoustic features remain unclear. By using sounds resynthesized with and without rapidly varying transients (not to be confused with the onset as a whole), this study set out to specify precisely the role of transients and quasi-stationary components in the perception of musical instrument sounds. In experiment 1, listeners were trained to identify ten instruments from 250 ms sounds. In a subsequent test phase, listeners identified instruments from 64 ms segments of sounds presented with or without transient components, either taken from the onset, or from the middle portion of the sounds. The omission of transient components at the onset impaired overall identification accuracy only by 6%, even though experiment 2 suggested that their omission was discriminable. Shifting the position of the gate from the onset to the middle portion of the tone impaired overall identification accuracy by 25%. Taken together, these findings confirm the prominent status of onsets in musical instrument identification, but suggest that rapidly varying transients are less indicative of instrument identity compared to the relatively slow buildup of sinusoidal components during onsets.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| |
Collapse
|