1
|
Lembke SA. Distinguishing between straight and curved sounds: Auditory shape in pitch, loudness, and tempo gestures. Atten Percept Psychophys 2023; 85:2751-2773. [PMID: 37721687 PMCID: PMC10600048 DOI: 10.3758/s13414-023-02764-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/04/2023] [Indexed: 09/19/2023]
Abstract
Sound-based trajectories or sound gestures draw links to spatiokinetic processes. For instance, a gliding, decreasing pitch conveys an analogous downward motion or fall. Whereas the gesture's pitch orientation and range convey its meaning and magnitude, respectively, the way in which pitch changes over time can be conceived of as gesture shape, which to date has rarely been studied in isolation. This article reports on an experiment that studied the perception of shape in uni-directional pitch, loudness, and tempo gestures, each assessed for four physical scalings. Gestures could increase or decrease over time and comprised different frequency and sound level ranges, durations, and different scaling contexts. Using a crossmodal-matching task, participants could reliably distinguish between pitch and loudness gestures and relate them to analogous visual line segments. Scalings based on equivalent-rectangular bandwidth (ERB) rate for pitch and raw signal amplitude for loudness were matched closest to a straight line, whereas other scalings led to perceptions of exponential or logarithmic curvatures. The investigated tempo gestures, by contrast, did not yield reliable differences. The reliable, robust perception of gesture shape for pitch and loudness has implications on various sound-design applications, especially those cases that rely on crossmodal mappings, e.g., visual analysis or control interfaces like audio waveforms or spectrograms. Given its perceptual relevance, auditory shape appears to be an integral part of sound gestures, while illustrating how crossmodal correspondences can underpin auditory perception.
Collapse
Affiliation(s)
- Sven-Amin Lembke
- Cambridge School of Creative Industries, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
2
|
Bordonné T, Kronland-Martinet R, Ystad S, Derrien O, Aramaki M. Exploring sound perception through vocal imitations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:3306. [PMID: 32486800 DOI: 10.1121/10.0001224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 04/18/2020] [Indexed: 06/11/2023]
Abstract
Understanding how sounds are perceived and interpreted is an important challenge for researchers dealing with auditory perception. The ecological approach to perception suggests that the salient perceptual information that enables an auditor to recognize events through sounds is contained in specific structures called invariants. Identifying such invariants is of interest from a fundamental point of view to better understand auditory perception and it is also useful to include perceptual considerations to model and control sounds. Among the different approaches used to identify perceptually relevant sound structures, vocal imitations are believed to bring a fresh perspective to the field. The main goal of this paper is to better understand how invariants are transmitted through vocal imitations. A sound corpus containing different types of known invariants obtained from an existing synthesizer was established. Participants took part in a test where they were asked to imitate the sound corpus. A continuous and sparse model adapted to the specificities of the vocal imitations was then developed and used to analyze the imitations. Results show that participants were able to highlight salient elements of the sounds that partially correspond to the invariants used in the sound corpus. This study also confirms that vocal imitations reveal how these invariants are transmitted through perception and offers promising perspectives on auditory investigations.
Collapse
Affiliation(s)
- Thomas Bordonné
- Aix Marseille Univ., CNRS, PRISM (Perception, Representations, Image, Sound, Music), 31 Chemin J. Aiguier, CS 70071, 13402 Marseille Cedex 20, France
| | - Richard Kronland-Martinet
- Aix Marseille Univ., CNRS, PRISM (Perception, Representations, Image, Sound, Music), 31 Chemin J. Aiguier, CS 70071, 13402 Marseille Cedex 20, France
| | - Sølvi Ystad
- Aix Marseille Univ., CNRS, PRISM (Perception, Representations, Image, Sound, Music), 31 Chemin J. Aiguier, CS 70071, 13402 Marseille Cedex 20, France
| | - Olivier Derrien
- Aix Marseille Univ., CNRS, PRISM (Perception, Representations, Image, Sound, Music), 31 Chemin J. Aiguier, CS 70071, 13402 Marseille Cedex 20, France
| | - Mitsuko Aramaki
- Aix Marseille Univ., CNRS, PRISM (Perception, Representations, Image, Sound, Music), 31 Chemin J. Aiguier, CS 70071, 13402 Marseille Cedex 20, France
| |
Collapse
|
3
|
Mehrabi A, Dixon S, Sandler M. Vocal imitation of percussion sounds: On the perceptual similarity between imitations and imitated sounds. PLoS One 2019; 14:e0219955. [PMID: 31344080 PMCID: PMC6657857 DOI: 10.1371/journal.pone.0219955] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 07/06/2019] [Indexed: 11/19/2022] Open
Abstract
Recent studies have demonstrated the effectiveness of the voice for communicating sonic ideas, and the accuracy with which it can be used to imitate acoustic instruments, synthesised sounds and environmental sounds. However, there has been little research on vocal imitation of percussion sounds, particularly concerning the perceptual similarity between imitations and the sounds being imitated. In the present study we address this by investigating how accurately musicians can vocally imitate percussion sounds, in terms of whether listeners consider the imitations 'more similar' to the imitated sounds than to other same-category sounds. In a vocal production task, 14 musicians imitated 30 drum sounds from five categories (cymbals, hats, kicks, snares, toms). Listeners were then asked to rate the similarity between the imitations and same-category drum sounds via web based listening test. We found that imitated sounds received the highest similarity ratings for 16 of the 30 sounds. The similarity between a given drum sound and its imitation was generally rated higher than for imitations of another same-category sound, however for some drum categories (snares and toms) certain sounds were consistently considered most similar to the imitations, irrespective of the sound being imitated. Finally, we apply an existing auditory image based measure for perceptual similarity between same-category drum sounds, to model the similarity ratings using linear mixed effect regression. The results indicate that this measure is a good predictor of perceptual similarity between imitations and imitated sounds, when compared to acoustic features containing only temporal or spectral features.
Collapse
Affiliation(s)
- Adib Mehrabi
- Department of Linguistics, Queen Mary University of London, London, England
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, England
| | - Simon Dixon
- Department of Linguistics, Queen Mary University of London, London, England
| | - Mark Sandler
- Department of Linguistics, Queen Mary University of London, London, England
| |
Collapse
|
4
|
Friberg A, Lindeberg T, Hellwagner M, Helgason P, Salomão GL, Elowsson A, Lemaitre G, Ternström S. Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:1467. [PMID: 30424637 DOI: 10.1121/1.5052438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Accepted: 08/16/2018] [Indexed: 06/09/2023]
Abstract
Vocal sound imitations provide a new challenge for understanding the coupling between articulatory mechanisms and the resulting audio. In this study, the classification of three articulatory categories, phonation, supraglottal myoelastic vibrations, and turbulence, have been modeled from audio recordings. Two data sets were assembled, consisting of different vocal imitations by four professional imitators and four non-professional speakers in two different experiments. The audio data were manually annotated by two experienced phoneticians using a detailed articulatory description scheme. A separate set of audio features was developed specifically for each category using both time-domain and spectral methods. For all time-frequency transformations, and for some secondary processing, the recently developed Auditory Receptive Fields Toolbox was used. Three different machine learning methods were applied for predicting the final articulatory categories. The result with the best generalization was found using an ensemble of multilayer perceptrons. The cross-validated classification accuracy was 96.8% for phonation, 90.8% for supraglottal myoelastic vibrations, and 89.0% for turbulence using all the 84 developed features. A final feature reduction to 22 features yielded similar results.
Collapse
Affiliation(s)
- Anders Friberg
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Tony Lindeberg
- Computational Brain Science Lab, Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 5, 10044 Stockholm, Sweden
| | - Martin Hellwagner
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Pétur Helgason
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Gláucia Laís Salomão
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Anders Elowsson
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Guillaume Lemaitre
- Institute for Research and Coordination in Acoustics and Music, 1 Place Igor Stravinsky, Paris 75004, France
| | - Sten Ternström
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| |
Collapse
|
5
|
Rising tones and rustling noises: Metaphors in gestural depictions of sounds. PLoS One 2017; 12:e0181786. [PMID: 28750071 PMCID: PMC5547699 DOI: 10.1371/journal.pone.0181786] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 07/06/2017] [Indexed: 11/19/2022] Open
Abstract
Communicating an auditory experience with words is a difficult task and, in consequence, people often rely on imitative non-verbal vocalizations and gestures. This work explored the combination of such vocalizations and gestures to communicate auditory sensations and representations elicited by non-vocal everyday sounds. Whereas our previous studies have analyzed vocal imitations, the present research focused on gestural depictions of sounds. To this end, two studies investigated the combination of gestures and non-verbal vocalizations. A first, observational study examined a set of vocal and gestural imitations of recordings of sounds representative of a typical everyday environment (ecological sounds) with manual annotations. A second, experimental study used non-ecological sounds whose parameters had been specifically designed to elicit the behaviors highlighted in the observational study, and used quantitative measures and inferential statistics. The results showed that these depicting gestures are based on systematic analogies between a referent sound, as interpreted by a receiver, and the visual aspects of the gestures: auditory-visual metaphors. The results also suggested a different role for vocalizations and gestures. Whereas the vocalizations reproduce all features of the referent sounds as faithfully as vocally possible, the gestures focus on one salient feature with metaphors based on auditory-visual correspondences. Both studies highlighted two metaphors consistently shared across participants: the spatial metaphor of pitch (mapping different pitches to different positions on the vertical dimension), and the rustling metaphor of random fluctuations (rapidly shaking of hands and fingers). We interpret these metaphors as the result of two kinds of representations elicited by sounds: auditory sensations (pitch and loudness) mapped to spatial position, and causal representations of the sound sources (e.g. rain drops, rustling leaves) pantomimed and embodied by the participants' gestures.
Collapse
|
6
|
Lemaitre G, Houix O, Voisin F, Misdariis N, Susini P. Vocal Imitations of Non-Vocal Sounds. PLoS One 2016; 11:e0168167. [PMID: 27992480 PMCID: PMC5161510 DOI: 10.1371/journal.pone.0168167] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 11/24/2016] [Indexed: 11/25/2022] Open
Abstract
Imitative behaviors are widespread in humans, in particular whenever two persons communicate and interact. Several tokens of spoken languages (onomatopoeias, ideophones, and phonesthemes) also display different degrees of iconicity between the sound of a word and what it refers to. Thus, it probably comes at no surprise that human speakers use a lot of imitative vocalizations and gestures when they communicate about sounds, as sounds are notably difficult to describe. What is more surprising is that vocal imitations of non-vocal everyday sounds (e.g. the sound of a car passing by) are in practice very effective: listeners identify sounds better with vocal imitations than with verbal descriptions, despite the fact that vocal imitations are inaccurate reproductions of a sound created by a particular mechanical system (e.g. a car driving by) through a different system (the voice apparatus). The present study investigated the semantic representations evoked by vocal imitations of sounds by experimentally quantifying how well listeners could match sounds to category labels. The experiment used three different types of sounds: recordings of easily identifiable sounds (sounds of human actions and manufactured products), human vocal imitations, and computational “auditory sketches” (created by algorithmic computations). The results show that performance with the best vocal imitations was similar to the best auditory sketches for most categories of sounds, and even to the referent sounds themselves in some cases. More detailed analyses showed that the acoustic distance between a vocal imitation and a referent sound is not sufficient to account for such performance. Analyses suggested that instead of trying to reproduce the referent sound as accurately as vocally possible, vocal imitations focus on a few important features, which depend on each particular sound category. These results offer perspectives for understanding how human listeners store and access long-term sound representations, and sets the stage for the development of human-computer interfaces based on vocalizations.
Collapse
Affiliation(s)
- Guillaume Lemaitre
- Equipe Perception et Design Sonores, STMS-IRCAM-CNRS-UPMC, Institut de Recherche et de Coordination Acoustique Musique, Paris, France
- * E-mail:
| | - Olivier Houix
- Equipe Perception et Design Sonores, STMS-IRCAM-CNRS-UPMC, Institut de Recherche et de Coordination Acoustique Musique, Paris, France
| | - Frédéric Voisin
- Equipe Perception et Design Sonores, STMS-IRCAM-CNRS-UPMC, Institut de Recherche et de Coordination Acoustique Musique, Paris, France
| | - Nicolas Misdariis
- Equipe Perception et Design Sonores, STMS-IRCAM-CNRS-UPMC, Institut de Recherche et de Coordination Acoustique Musique, Paris, France
| | - Patrick Susini
- Equipe Perception et Design Sonores, STMS-IRCAM-CNRS-UPMC, Institut de Recherche et de Coordination Acoustique Musique, Paris, France
| |
Collapse
|